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Abstract State-of-the-art distributed RDF systems par¬ 
tition data across multiple computer nodes (workers). 
Some systems perform cheap hash partitioning, which 
may result in expensive query evaluation, while others 
apply heuristics aiming at minimizing inter-node com¬ 
munication during query evaluation. This requires an 
expensive data pre-processing phase, leading to high 
startup costs for very large RDF knowledge bases. Apri- 
ori knowledge of the query workload has also been used 
to create partitions, which however are static and do not 
adapt to workload changes; as a result, inter-node com¬ 
munication cannot be consistently avoided for queries 
that are not favored by the initial data partitioning. 

In this paper, we propose AdHash, a distributed 
RDF system, which addresses the shortcomings of pre¬ 
vious work. First, AdHash applies lightweight partition¬ 
ing on the initial data, that distributes triples by hash¬ 
ing on their subjects; this renders its startup overhead 
low. At the same time, the locality-aware query opti¬ 
mizer of AdHash takes full advantage of the partition¬ 
ing to (i) support the fully parallel processing of join 
patterns on subjects and (ii) minimize data communi¬ 
cation for general queries by applying hash distribution 
of intermediate results instead of broadcasting, wher¬ 
ever possible. Second, AdHash monitors the data access 
patterns and dynamically redistributes and replicates 
the instances of the most frequent ones among workers. 
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As a result, the communication cost for future queries is 
drastically reduced or even eliminated. To control repli¬ 
cation, AdHash implements an eviction policy for the 
redistributed patterns. Our experiments with synthetic 
and real data verify that AdHash (i) starts faster than 
all existing systems, (ii) processes thousands of queries 
before other systems become online, and (iii) gracefully 
adapts to the query load, being able to evaluate queries 
on billion-scale RDF data in sub-seconds. 

1 Introduction 

The RDF data model does not require a predefined 
schema and represents information from diverse sources 
in a versatile manner. Therefore, social networks, search 
engines, shopping sites and scientific databases are adopt¬ 
ing RDF for publishing Web content. Many large public 
knowledge bases, such as Bio2RDI0 and YAGC@ have 
billions of facts in RDF format. RDF datasets consist of 
triples of the form (subject, predicate , object), where 
predicate represents a relationship between two enti¬ 
ties: a subject and an object. An RDF dataset can be 
regarded as a long relational table with three columns. 
An RDF dataset can also be viewed as a directed la¬ 
beled graph, where vertices and edge labels correspond 
to entities and predicates, respectively. Figure [T] shows 
an example RDF graph of students and professors in 
an academic network. 

SPARQlH is the standard query language for RDF. 
Each query is a set of RDF triple patterns; some of 
the nodes in a pattern are variables which may appear 
in multiple patterns. For example, the query in Figure 
[2Ka) returns all professors who work for CS with their 

1 http://www.bio2rdf.org/ 

2 http://yago-knowledge.org/ 

3 http://www.w3.org/TR/rdf-sparql-query/ 
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Fig. 1 Example RDF graph. An edge and its associated ver¬ 
tices correspond to an RDF triple; e.g., (Bill, worksFor, CS). 


SELECT ?prof ?stud WHERE { 
?prof worksFor CS . 

?stud advisor ?prof . 

> 

(a) SPARQL 



(b) Graph 


Fig. 2 A query that finds CS professors with their advisees. 


advisees. The query corresponds to the graph pattern 
in Figure [2Jb). The answer is the set of ordered bind¬ 
ings of (7p, Is) that render the query graph isomorphic 
to subgraphs in the data. Assuming the data is stored 
in a table D(s,p, o), the query can be answered by first 
decomposing it into two subqueries, each correspond¬ 
ing to a triple pattern: q 1 = <Jp= wor ksForAo=cs{D) and 

<72 = cr p ~advisor (D). The subqueries can be answered 
independently by scanning table D; then, we can join 
their intermediate results on the subject and object at¬ 
tribute: q\ txi gi .s= 9 2 .o c? 2 - By applying the query on the 
data of Figure [1] we get (?prof , ?stud) G {(James, 
Lisa),(Bill, John), (Bill, Fred),(Bill, Lisa)}. 

As the volume of RDF data continues soaring, man¬ 
aging, indexing and querying RDF data collections be¬ 
comes challenging. Early research efforts focused on 
building efficient centralized RDF systems; like RDF- 
3X [M], HexaStore [32!, TripleBit [35] and gStore [55] . 
However, centralized data management and search does 
not scale well for complex queries on web-scale RDF 
data. As a result, distributed RDF management sys¬ 
tems were introduced to improve performance. Such 
systems scale-out by partitioning RDF data among many 
computer nodes (workers) and evaluating queries in a 
distributed fashion. A SPARQL query is decomposed 
into multiple subqueries that are evaluated by each 
node independently. Since data is distributed, the nodes 
may need to exchange intermediate results during query 
evaluation. Consequently, queries with large interme¬ 
diate results incur high communication cost, which is 
detrimental to the query performance mm- 

Distributed RDF systems aim at minimizing the 
number of decomposed subqueries by partitioning the 
data among workers. The goal is that each node has 


all the data it needs to evaluate the entire query and 
there is no need for exchanging intermediate results. In 
such a parallel query evaluation, each node contributes 
a partial result of the query; the final query result is 
the union of all partial results. To achieve this, some 
triples may need to be replicated in multiple partitions. 
For example, in Figure [1] assume the data graph is di¬ 
vided by the dotted line into two partitions and assume 
that triples follow their subject placement. To answer 
the query in Figure [2j nodes have to exchange inter¬ 
mediate results because triples (Lisa, advisor, Bill) 
and (Fred, advisor, Bill) cross the partition boundary. 
Replicating these triples in both partitions allows each 
node to answer the query without communication. Still, 
even sophisticated partitioning and replication cannot 
guarantee that arbitrarily complex SPARQL queries 
can be processed in parallel; thus, expensive distributed 
query evaluation, with intermediate results exchanged 
between nodes cannot always be avoided. 

Challenges. Existing distributed RDF systems are fac¬ 
ing two limitations. (*) Partitioning cost: graph parti¬ 
tioning is an NP-complete problem ST]; thus, existing 
systems perform heuristic partitioning. In systems like 
[HI2M2137| that use simple hash partitioning heuris¬ 
tics, queries have low chances to be evaluated in paral¬ 
lel without any communication between nodes. On the 
other hand, systems that use sophisticated partitioning 
heuristics [T51IT31l231f35] suffer from high preprocessing 
cost and sometimes high replication. More importantly, 
they pay the cost of partitioning the entire data regard¬ 
less of the anticipated workloads. However, as shown in 
a recent study mi only a small fraction of the whole 
graph is actually accessed by typical real query work¬ 
loads. For example, a real workload consisting of more 
than 1,600 queries executed on DBpedia (459M triples) 
touches only 0.003% of the whole data. Therefore, we 
argue that distributed RDF systems should leverage 
query workloads in data partitioning, (ii) Adaptivity: 
WARP [T7] and Partout 12 do consider the workload 
during data partitioning and achieve a significant re¬ 
duction in the replication ratio, while showing better 
query performance compared to systems that partition 
the data blindly. Nonetheless, both these systems as¬ 
sume a representative (i.e., static) query workload and 
do not adapt to changes. However, because of workloads 
diversity and dynamism, Alug et al. [T] showed that sys¬ 
tems need to continuously adapt to workloads in order 
to consistently provide good performance; relying on a 
static workload results in performance degradation for 
queries that are not represented by it. 

In this paper, we propose Adaptive Hashing (Ad- 
Hash), a distributed in-memory RDF engine. AdHash 
alleviates the aforementioned limitations of existing sys- 
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terns based on the following key principles. 
Lightweight Initial Partitioning: AdHash uses an 
initial hash partitioning, that distributes triples by hash¬ 
ing on their subjects. This partitioning has low cost and 
does not incur any replication. Thus, the preprocessing 
time is low, partially addressing the first challenge. 
Hash-based Locality Awareness: AdHash achieves 
competitive performance by maximizing the number of 
joins that can be executed in parallel without data 
communication by exploiting hash-based locality; the 
join patterns on subjects included in a query can be 
processed in parallel. In addition, intermediate results 
can potentially be hash-distributed to single workers 
instead of being broadcasted everywhere. The locality- 
aware query optimizer of AdHash considers these prop¬ 
erties to compute an evaluation plan that minimizes 
intermediate results shipped between workers. 
Adapting by Incremental Redistribution: AdHash 
monitors the executed workload and incrementally up¬ 
dates a hierarchical heat-map of accessed data patterns. 
Hot patterns are redistributed and potentially repli¬ 
cated in the system in a way that future queries that in¬ 
clude them are executed in parallel by all workers with¬ 
out data communication. To control replication, Ad¬ 
Hash operates within a budget and employs an eviction 
policy for the redistributed patterns. This way, AdHash 
overcomes the limitations of static partitioning schemes 
and adapts dynamically to changing workloads. 

In summary, our contributions are: 

— We introduce AdHash, a distributed SPARQL en¬ 
gine that does not require expensive preprocessing. 
By using lightweight hash partitioning, avoiding the 
upfront cost, and adopting a pay-as-you-go approach, 
AdHash executes tens of thousands of queries on 
large graphs within the time it takes other systems 
to conduct their initial partitioning. 

— We propose a locality-aware query planner and a 
cost-based optimizer for AdHash to efficiently exe¬ 
cute queries that require data communication. 

— We present a novel approach for monitoring and in¬ 
dexing workloads in the form of hierarchical heat 
maps. Queries are transformed and indexed using 
these maps to facilitate the adaptivity of AdHash. 
We introduce an Incremental ReDistribution (IRD) 
technique. Guided by the workload, IRD incremen¬ 
tally redistributes portions of the data that are ac¬ 
cessed by hot patterns. Based on IRD, AdHash pro¬ 
cesses future queries without data communication. 

— We evaluate AdHash using synthetic and real data 
and compare with state-of-the-art systems. AdHash 
partitions billion-scale RDF data and starts answer¬ 
ing queries in less than 14 minutes, while other sys¬ 
tems need hours or days. AdHash executes large 


workloads orders of magnitude faster than existing 
approaches. To the best of our knowledge, AdHash 
is the only system capable of providing sub-second 
execution times for queries with complex structures 
on billion scale RDF data. 

The rest of the paper is organized as follows. Sec- 
tion[2]reviews existing distributed RDF systems and the 
techniques used by them for scalable SPARQL query 
evaluation. Section [3] presents the architecture of Ad¬ 
Hash and provides an overview of the system’s compo¬ 
nents. Section [3] discuses our locality-aware query plan¬ 
ning and distributed query evaluation, whereas Section 
[5] explains the adaptivity feature of AdHash. Section 
[G] contains the experimental results and Section [7] con¬ 
cludes the paper. 

2 Related Work 

In this section, we review recent distributed RDF sys¬ 
tems, which are related to AdHash. Table [0 summa¬ 
rizes the main characteristics of these systems. 

Lightweight Data Partitioning: Several systems 
are based on the MapReduce framework [3j and use 
the Hadoop Distributed File System (HDFS) to store 
RDF data. HDFS uses horizontal random data parti¬ 
tioning. SHARD j5S] stores the whole RDF data into 
one HDFS file. HadoopRDF [TU] also uses HDFS but 
splits the data into multiple smaller files. SHARD and 
HadoopRDF solve SPARQL queries using a set of MapRe¬ 
duce iterations. 

Trinity.RDF [37] is a distributed in-memory RDF 
engine that can handle web scale RDF data. It rep¬ 
resents RDF data in its native graph form (i.e., using 
adjacency lists) and uses a key-value store as the back¬ 
end storage. The RDF graph is partitioned using vertex 
id as hash key. This is equivalent to partitioning the 
data twice; first using subjects as hash keys and sec¬ 
ond using objects. Trinity.RDF uses graph exploration 
for SPARQL query evaluation and relies heavily on its 
underlying high-end InfiniBand interconnect. In every 
iteration, a single subquery is explored starting from 
valid bindings by all workers. This way, generation of 
redundant intermediate results is avoided. However, be¬ 
cause exploration only involves two vertices (source and 
target), Trinity.RDF cannot prune invalid intermediate 
results without carrying all their historical bindings. 
Hence, workers need to ship candidate results to the 
master to finalize the results, which is a potential bot¬ 
tleneck of the system. 

Rya [26] and H2RDF + [25] use key-value stores for 
RDF data storage which range-partition the data based 
on keys such that the keys in each partition are sorted. 
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Table 1 Summary of state-of-the-art distributed RDF systems 


System 

Partitioning 

Strategy 

Partitioning 

Cost 

Replication 

Workload 

Awareness 

Adaptive 

TriAD [15] 

Graph-based (METIS) &; Horizontal triple Sharding 

High 

Yes 

No 

No 

H-RDF-3X [T5] 

Graph-based (METIS) 

High 

Yes 

No 

No 

Partout |12| 

Workload-based horizontal fragmentation 

High 

No 

Yes 

No 

SHAPE 1221 

Semantic Hash 

High 

Yes 

No 

No 

Wu et al. 1331 

End-to-end path partitioning 

Moderate 

Yes 

No 

No 

Trinity. RDF 1371 

Hash 

Low 

Yes 

No 

No 

H2RDF+ [25] 

H-Base partitioner (range) 

Low 

No 

No 

No 

SHARD [28] 

Hash 

Low 

No 

No 

No 

AdHash 

Hash 

Low 

Yes 

Yes 

Yes 


When solving a SPARQL query, Rya executes the first 
subquery using range scan on the appropriate index; 
it then utilizes index lookups for the next subqueries. 
H2RDF+ executes simple queries in a centralized fash¬ 
ion, whereas complex queries are solved using a set of 
MapReduce iterations. 

All the above systems use lightweight partitioning 
schemes, which are computationally inexpensive; how¬ 
ever, queries with long paths and complex structures in¬ 
cur high communication costs. In addition, systems that 
use MapReduce for join evaluation suffer from its high 
overhead [151133] . On the contrary, although our Ad- 
Hash system also uses lightweight hash partitioning, it 
avoids excessive data shuffling by exploiting hash-based 
data locality. Furthermore, it adapts incrementally to 
the workload to further minimize communication. 

Sophisticated Partitioning Schemes and Repli¬ 
cation: Several systems employ general graph parti¬ 
tioning techniques to partition RDF data, in order to 
improve data locality. EAGRE [38] focuses on minimiz¬ 
ing the I/O cost. The RDF graph is transformed into 
a compressed entity graph that is partitioned using a 
MinCut algorithm, such as METIS [23] • H-RDF-3X JT5] 
uses METIS to partition the RDF graph among work¬ 
ers. It also enforces the so-called fc-hop guarantee so 
any query with radius fc or less can be executed with¬ 
out communication. Queries with radius larger than k 
are executed using expensive MapReduce joins. Repli¬ 
cation increases exponentially with fc; therefore, fc must 
be kept small (e.g., fc < 2 in [H]). Both EAGRE and H- 
RDF-3X suffer from the significant overhead of MapRe¬ 
duce-based joins for queries that cannot be evaluated 
locally. For such queries, sub-second query evaluation is 
not possible even with state-of-the-art MapReduce 
implementations, like Hadoop-|—(- [ 9 ] and Spark m- 

TriAD [TQ uses METIS for data partitioning. Edges 
which cross partitions replicated resulting in a 1 —hop 
guarantee. A summary graph is defined, which includes 
a vertex for each partition. Vertices in this graph are 
connected by the cross-partition edges. A query in TriAD 
is evaluated against the summary graph first, in order to 


prune partitions that do not contribute to query results. 
Then, the query is evaluated on the RDF data residing 
in the partitions retrieved from the summary graph. 
Multiple join operators are executed concurrently by all 
workers, which communicate via an asynchronous mes¬ 
sage passing protocol. Sophisticated partitioning tech¬ 
niques, like MinCut, reduce the communication cost sig¬ 
nificantly. However, such techniques are prohibitively 
expensive and do not scale for large graphs, as shown 
in [22]. Furthermore, MinCut does not yield good parti¬ 
tioning for dense graphs. Thus, TriAD does not benefit 
from the summary graph pruning technique in dense 
RDF graphs because of the high edge-cut. To alleviate 
METIS overhead, an efficient approach for partitioning 
large graphs was introduced [51]. Nonetheless, there will 
always be SPARQL queries with poor locality that cross 
partition boundaries and result in poor performance. 

SHAPE [22] proposed a semantic hash portioning 
approach for RDF data. SHAPE starts by simple hash 
partitioning and employs the same fc-hop strategy as 
H-RDF-3X [IS]. It also relies on URI hierarchy, for 
grouping vertices to increase data locality. Similar to 
H-RDF-3X, SHAPE suffers from the high overhead of 
MapReduce-based joins. Furthermore, URI-based group¬ 
ing results in skewed partitioning if a large percentage 
of vertices share prefixes. This behavior is noticed in 
both real as well as synthetic datasets (See Section 0. 

Recently, Wu et al. [33] proposed an end-to-end path 
partitioning scheme, which considers all possible di¬ 
rected paths in the RDF graph. These paths are merged 
in a bottom-up fashion, beginning with the paths start¬ 
ing vertices. While this approach works well for star, 
chain and directed cyclic queries; other types of queries 
result in significant communication. For example, queries 
with object-object joins or queries that do not associate 
each query vertex with the type predicate would re¬ 
quire inter-worker communication. Note that our adap¬ 
tivity technique (Section [5j is orthogonal to and can 
be combined with end-to-end path partitioning as well 
as other partitioning heuristics to efficiently evaluate 
queries that are not favored by the partitioning. 
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Workload-Aware Data Partitioning: Most of the 
aforementioned partitioning techniques focus on min¬ 
imizing communication without considering the work¬ 
load. A recent study m shows that real query work¬ 
loads touch a small fraction of the data. Therefore, uti¬ 
lizing the query workload helps to reduce communica¬ 
tion costs for queries that cannot be evaluated in par¬ 
allel, based on the partitioning scheme used. Partout 
m is a distributed engine, which relies on a given 
workload to divide the data between nodes. It first ex¬ 
tracts a representative triple patterns from the query 
load. Then uses these patterns to partition the data 
into fragments and collocates fragments that are ac¬ 
cessed together by queries on the same worker. Simi¬ 
larly, WARP [T7J uses a representative query workload 
to replicate frequently accessed data. However, if the 
workload changes or the user query is not in the rep¬ 
resentative workload, Partout and WARP incur high 
communication costs. They can only adapt to changes 
in the workload, by applying expensive re-partitioning 
of the entire data. On the contrary, our AdHash system 
adapts incrementally by replicating only the data ac¬ 
cessed by the workload which is small, as we discussed. 
SPARQL on Vertex-centric. Sedge |Mj solves the 
problem of dynamic graph partitioning and demonstrates 
its partitioning effectiveness using SPARQL queries over 
RDF. The entire graph is replicated several times and 
each replica is partitioned differently. Every SPARQL 
query is translated manually into a Pregel [231 program 
and is executed against the replica that minimizes com¬ 
munication. Still, this approach incurs excessive repli¬ 
cation, as it duplicates the entire data several times. 
Moreover, its lack of support for ad-hoc queries makes 
it counter-productive; a user needs to manually write 
an optimized query evaluation program in Pregel. 

Materialized views: Several works attempt to speed 
up the execution of SPARQL queries by materializing 
a set of views mm or a set of path expressions fTOl . 
The selection of views is based on a representative work¬ 
load. Our approach does not generate local materialized 
views. Instead, we redistribute the data accessed by hot 
patterns in a way that preserves data locality and allows 
queries to be executed with minimal communication. 

Relational Model: There also exist relevant systems 
that focus on data models other than RDF. Schism 
[7] deals with data placement for distributed OLTP 
RDBMS. Using a sample workload, Schism minimizes 
the number of distributed transactions by populating 
a graph of co-accessed tuples. Tuples accessed in the 
same transaction are put in the same server. This is not 
appropriate for SPARQL because some queries access 
large parts of the data that would overwhelm a sin- 



Fig. 3 System architecture of AdHash 

gle machine. Instead, AdHash exploits parallelism by 
executing such a query across all machines in parallel 
without communication. H-Store [30] is an in-memory 
distributed OLTP RDBMS that uses a data partition¬ 
ing technique similar to ours. Nevertheless, H-Store as¬ 
sumes that the schema and the query workload are 
given in advance and assumes no ad-hoc queries. Al¬ 
though, these could be valid assumptions for OLTP 
databases, they are not for RDF data stores. 

Eventual indexing: Idreos et al. |20] introduced the 
concept of reducing the data-to-query time for rela¬ 
tional data. They avoid building indices during data 
loading; instead, they reorder tuples incrementally dur¬ 
ing query processing. In AdHash, we extend eventual 
indexing to dynamic and adaptive graph partitioning. 
In our problem, graph partitioning is very expensive; 
hence, the potential benefits of minimizing the data-to- 
query time are substantial. 

3 System Architecture 

AdHash employs the typical master-slave paradigm and 
is deployed on a shared-nothing cluster of machines 
(see Figure 0. The master and workers communicate 
through message passing. The same architecture is used 
by other systems, e.g., Trinity.RDF [57] and TriAD [151 . 

3.1 Master 

The master begins by partitioning the data among work¬ 
ers and collecting global statistics. Then, it receives 
queries from users, generates execution plans, coordi¬ 
nates workers, collects results, and returns final results. 
String Dictionary. RDF data contains long strings in 
the form of URIs and literals. To avoid the storage, pro¬ 
cessing, and communication overheads, we encode RDF 
strings into numerical IDs and build a bi-directional dic¬ 
tionary. This approach is used by state-of-the-art sys- 

Data Partitioner. A recent study m showed that 
joins on the subject column account for 60% of the 
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joins in a real workload of SPARQL queries. Therefore, 
AdHasli uses lightweight hash-based triple sliarding on 
subject values. Given W workers, a triple t is assigned 
to worker w,, where i is the result of a hash function 
applied on t. subject^ This way all triples that share 
the same subject will be assigned to the same worker. 
Consequently, any star query joining on subjects can 
be evaluated without communication among workers. 
We do not hash on objects because they can be literals 
and common types. Hashing on objects would assign 
all triples of the same type to one worker, resulting 
in load imbalance and limited parallelism [18] . To vali¬ 
date our argument, we use the synthetic LUBM- 4OOC0 
and real YAGOsH datasets, which have around 500M 
and 300M triples, respectively. Both datasets are par¬ 
titioned among 1,024 partitions using 3 methods: (?) 
hashing on subjects, (??) hashing on objects, and (???) 
random partitioning. Tabled shows statistics about the 
triples distribution among partitions for each method. 
Hashing on objects results in severely imbalanced parti¬ 
tions, whereas random partitioning and hashing on the 
subjects result in balanced partitions. We do not use 
random partitioning because it destroys data locality. 
Statistics Manager. It maintains statistics about the 
RDF graph, which are used for global query planning 
and during adaptivity. Statistics are collected in a dis¬ 
tributed manner during bootstrapping (Section 13.311 . 
Redistribution Controller. It monitors the workload 
in the form of heat maps and triggers the adaptive In¬ 
cremental ReDistribution (IRD) process for hot pat¬ 
terns. Only data accessed by hot patterns are redis¬ 
tributed and potentially replicated among workers. A 
redistributed hot pattern can be answered by all work¬ 
ers in parallel without communication. Using hierarchi¬ 
cal representation, replicated hot patterns are indexed 
in a structure called Pattern Index (PI). Patterns in the 
PI can be combined for evaluating future queries with¬ 
out communication. Further, the controller implements 
replica replacement policy to keep replication within a 
threshold (Section [5]). 

Locality-Aware Query Planner. Our planner uses 
the global statistics from the statistics manager and the 
pattern index from the redistribution controller to de¬ 
cide if a query, in whole or partially, can be processed 
without communication. Queries that can be fully an¬ 
swered without communication are planned and exe¬ 
cuted by each worker independently. On the other hand, 
for queries that require communication, the planner ex¬ 
ploits the hash-based data locality and the query struc- 


4 For simplicity, we use: i = t.subject mod W. 

5 http://swat.cse.lehigh.edu/projects/lubm/ 

6 http://yago-knowledge.org/ 


Table 2 Triple distribution (in thousands of triples) 



LUBM-4000 

YAG02 

Method 

Max 

Min 

StDev 

Max 

Min 

StDev 

hash(subj) 

527 

515 

3 

296 

267 

3 

hash(obj) 

32,648 

397 

1,463 

9,914 

140 

663 

random 

524 

519 

1 

280 

276 

1 


ture to find a plan that minimizes communication and 
the number of distributed joins (Section [J). 

Failure Recovery. The master does not store any data 
but can be considered as a single-point of failure be¬ 
cause it maintains the dictionaries, global statistics, and 
PI. A standard failure recovery mechanism (log-based 
recovery El) can be employed by AdHash. Assuming 
stable storage, the master can recover by loading the 
dictionaries and global statistics because they are read¬ 
only and do not change in the system. The PI can be 
easily recovered by reading the query log and recon¬ 
structing the heat map. Workers on the other hand 
store data; hence, in case of a failure, data partitions 
need to be recovered. Shen et al. [2H| proposes a fast fail¬ 
ure recovery solution for distributed graph processing 
systems. The solution is a hybrid of checkpoint-based 
and log-based recovery schemes. This approach can be 
used by AdHash to recover worker partitions and recon¬ 
struct the replica index. However, reliability is outside 
this paper scope and we leave it for future work. 

3.2 Worker 

Storage Module. Each worker Wi stores its local set 
of triples Di in an in-memory data structure, which 
supports the following search operations, where s, p , 
and o are subject, predicate, and object: 

1 . given p, return set {(s,o) | ( s,p,o) £ Di}. 

2 . given s and p , return set {o \ (s,p,o) £ Di}. 

3. given o and p , return set {s | (s,p, o) £ Di}. 

Since all the above searches require a known predicate, 
we primarily hash triples in each worker by predicate. 
The resulting predicate index (simply P-index) imme¬ 
diately supports search by predicate (i.e., the first op¬ 
eration). Furthermore, we use two hash maps to re¬ 
partition each bucket of triples having the same predi¬ 
cate, based on their subjects and objects, respectively. 
These two hash maps support the second and third 
search operation and they are called predicate-subject. 
index (PS-index) and predicate-object index (PO-index), 
respectively. Given the number of unique predicates is 
typically small, our storage scheme avoids unnecessary 
repetitions of predicate values. Note that when answer¬ 
ing a query, if the predicate itself is a variable, then we 
simply iterate over all predicates. Our indexing scheme 
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is tailored for typical RDF knowledge bases and their 
workloads, being orthogonal to the rest of the system 
(i.e., alternative schemes, like indexing all SPO combi¬ 
nations {24] could be used at each worker). Finally, the 
storage module computes statistics about its local data 
and shares them with the master after data loading. 
Replica Index. Each worker has an in-memory replica 
index that stores and indexes replicated data as a re¬ 
sult of the adaptivity. This index initially contains no 
data and is updated dynamically by the incremental 
redistribution (IRD) process (Section [5]). 

Query Processor. Each worker has a query proces¬ 
sor that operates in two modes: (i) Distributed Mode 
for queries that require communication. In this case, all 
workers solve the query concurrently and exchange in¬ 
termediate results (Section [LT]). (ii) Pai'allel Mode for 
queries that can be answered without communication. 
Each worker has all the data needed for query evalua¬ 
tion locally (Section [5]). 

Local Query Planner. Queries executed in parallel 
mode are planned by workers autonomously. For exam¬ 
ple, star queries joining on the subject are processed 
in parallel due to the initial partitioning. Moreover, 
queries answered in parallel after the adaptivity pro¬ 
cess are also planned by local query planners. 

3.3 Statistics Collection 

AdHasli collects and aggregates statistics from work¬ 
ers for global query planning and during the adaptivity 
process. Keeping statistics about each vertex in the en¬ 
tire RDF data graph is prohibitively expensive. AdHash 
solves the problem by focusing on predicates rather 
than vertices. Therefore, the storage complexity of statis¬ 
tics is linear to the number of unique predicates, which 
is typically very small compared to the data size. For 
each unique predicate p, we calculate the following statis¬ 
tics: (i) The cardinality oip, denoted as |p|, is the num¬ 
ber of triples in the data graph that have p as predi¬ 
cate. (ii) [p.s| and \p.o\ are the numbers of unique sub¬ 
jects and objects using predicate p , respectively. (Hi) 
The subject score of p, denoted as ps, is the average de¬ 
gree of all vertices s, such that (s,p, ?x) € D. (iv) The 
object score of p, denoted as po, is the average degree of 
all vertices o, such that (?x,p,o) € D. (v) Predicates 
Per Subject P ps = |p|/|p.s| is the average number of 
triples with predicate p per unique subject, (vi) Predi¬ 
cates Per Object P po = |p|/|p.o| is the average number 
of triples with predicate p per unique object. 

For example, Figure|4]illustrates the computed statis¬ 
tics for predicate advisor using the data graph of Figure 
|T] Since advisor appears four times with three unique 
subjects and two unique objects, |p| = 4, |p.s| = 3 and 


Subject Predicate Object 


3=0*1 SgJd OsC 

3= 1 Usa F AT. » |Jamest ^ 

Fig. 4 Statistics calculation for p=advisor, based on Figure|T| 

\p.o\ = 2. The subject score ps is (1+3+4)/3 = 2.67 be¬ 
cause advisor appears with four unique subjects: Fred, 
John and Lisa, whose degrees (i.e., in-degree plus out- 
degree) are 1, 3 and 4, respectively. Similarly, po = 
(6 + 4)/2 = 5. Finally, the number of predicates per 
subject P ps is 4/3 = 1.3 because Lisa is associated with 
two instances of the predicate (i.e., two advisors). 

3.4 System overview 

Here we give an abstract overview of AdHash. After 
encoding and partitioning the data, each worker loads 
its triples and collects local statistics. The master node 
aggregates these statistics and AdHash starts answer¬ 
ing queries. A user submits a SPARQL query Q to the 
master. The query planner at the master consults the 
redistribution controller to decide whether Q can be ex¬ 
ecuted in parallel mode. The redistribution controller 
uses global statistics to transform Q into a hierarchical 
representation Q' (details in Section 15721) . If Q' exists in 
the Pattern Index (PI) or if Q' is a star query joining on 
the subject column, then Q can be answered in parallel 
mode; otherwise, it is executed in distributed mode. If 
Q is executed in distributed mode, the locality-aware 
planner devises a global query plan. Each worker gets a 
copy of this plan and evaluates the query accordingly. If 
Q can be answered in parallel mode, the master broad¬ 
casts the query to all workers. Each worker generates 
its local query plan using local statistics and executes 
Q without communication. 

As more queries get submitted to the system, the 
redistribution controller updates the heat map, identi¬ 
fies hot patterns, and triggers the IRD process. Conse¬ 
quently, AdHash adapts to the query load by answering 
more queries in parallel mode. 

4 Query Evaluation 

A basic SPARQL query consists of multiple subquery 
triple patterns: qi, q 2 , ■ ■ ■, q n - Each subquery includes 
variables or constants, some of which are used to bind 
the patterns together, forming the entire query graph 
(e.g., see Figure [5]b)). A query with n subqueries re¬ 
quires the evaluation of n — 1 joins. Since data are 
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Table 3 Matching result of q\ on workers u>i and W 2 - 


w± w 2 


?prof 


?prof 

James 


Bill 


Table 4 The final query results q\ ixi 52 on both workers. 


memory resident and hash-indexed, we favor hash joins 
as they prove to be competitive to more sophisticated 
join methods [3]. Our query planner devises an ordering 
of these subqueries and generates a left-deep join tree, 
where the right operand of each join is a base subquery 
(not an intermediate result). We do not use bushy tree 
plans to avoid building indices for intermediate results. 

4.1 Distributed Query Evaluation 

In AdHash, triples are hash partitioned among many 
workers based on subject values. Consequently, subject 
star queries (i.e. all subqueries join on the subject col¬ 
umn) can be evaluated locally in parallel without com¬ 
munication. However, for other types of queries, workers 
may have to communicate intermediate results during 
join evaluation. For example, consider the query in Fig¬ 
ure [2] and the partitioned data graph in Figure [T| The 
query consists of two subqueries <71 and ( 72 , where: 

— q±: (?prof, worksFor, CS) 

— <72: (?stud, advisor , ?prof) 

The query is evaluated by a single subject-object 
join; however, neither of the workers has all the data 
needed for evaluating the entire query. In other words, 
workers need to communicate because objects’ local¬ 
ity is not known. To solve such queries, AdHash em¬ 
ploys the Distributed Semi-Join (DSJ) algorithm. Each 
worker scans the PO-index to find all triples matching 
( 71 . The results on workers wi and W 2 are shown in Ta- 
blc[3] Then, each worker creates a projection on the join 
column I prof and exchanges it with the other worker. 
Once the projected column is received, each worker 
computes the semi-join qi x? P rofq 2 using its PO-index. 
Specifically, Wi probes p = advisor, o = Bill while 
u >2 probes p = advisor, o = James to their PO-index. 
Note that workers also need to evaluate semi-joins us¬ 
ing their local projected column. Then, the semi-join 
results are shipped to the sender. In this case, w\ sends 
(Lisa, advisor , Bill) and (Fred, advisor , Bill) to w 2 ; 


Table 5 The final query results 52 x q\ on both workers. 


no candidate triples are sent from W 2 because James 
has no advisees on W 2 ■ Finally, each worker computes 
the final join q\ 1 x 1 ? pro f < 72 - The final query results at 
both workers are shown in Table [4] 

4-1.1 Hash-based data locality 

Observation 1 DSJ can benefit from subject hash lo¬ 
cality to minimize communication. If the join column 
of the right operand is subject, the projected column of 
the left operand is hash distributed by all workers. Oth¬ 
erwise, the projected column on each worker is broad¬ 
casted to all other workers. 

In our example, since the join column of (72 is the ob¬ 
ject column (Iprof), each worker sends the entire join 
column to the other worker. However, based on Obser¬ 
vation [TJ communication can be minimized if the join 
order is reversed (i.e., (72 xi <7i). In this case, each worker 
scans the P-index to find triples matching <72 and creates 
a projection on Iprof. Then, because Iprof is the sub¬ 
ject of < 71 , both workers exploit the subject hash-based 
locality by partitioning the projection column and com¬ 
municating each partition to the respective worker, as 
opposed to broadcasting the entire projection column 
to all workers. Consequently, w\ sends Bill to only 
W 2 because of Bill’s hash value. The final query results 
are shown in Table [5] Notice that the final results are 
the same for both query plans; however, the results re¬ 
ported by each worker are different. 

4-1.2 Pinned subject 

Observation 2 Under the subject hash partitioning, 
combining right-deep tree planning and the DSJ algo¬ 
rithm for solving SPARQL queries, causes the interme¬ 
diate and final results to be local to the subject of the 
first executed subquery pattern p\. We refer to this sub¬ 
ject as pinned_subject. 

In our example, executing qi first causes Iprof to be 
the pinnedsubject because it is the subject of qi. Hence, 
the intermediate and final results are local (pinned) to 
the bindings of Iprof, Jaimes and Bill in w\ and u> 2 , 
respectively. Changing the order by executing <72 first 
made I stud to be the pinnedsubject. Accordingly, the 
results are pinned at the bindings of I stud. 


W2 


?prof 

?stud 

Bill 

Lisa 

Bill 

John 

Bill 

Fred 


w 1 


?prof 

?stud 

James 

Lisa 


W\ 


?prof 

?stud 

James 

Lisa 

Bill 

Lisa 

Bill 

Fred 


w 2 


?prof 

?stud 

Bill 

John 
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PROBE(PS-index) 

| ?prof | ?stud | | ?stud | uGradFrom | ?univ 
^DSJ ?prof 


SCAN(PO-index) PROBE(PO-index) 
worksFor | CS | | ?stud | advisor | ?prof | 

Worker 1 



?prof | ?stud~| | ?stud | uGradFrom | ?univ | 


BroadcastJC(?prof) 


SCAN(PO-index) PROBE(PO-index) 
worksFor | CS | | ?stud | advisor | ?prof | 

Worker 2 



advisor | ?prof | | ?prof | worksFor | CS~| 

Worker 1 


advisor | ?prof | | ?prof | worksFor | CS | 

Worker 2 


(a) <?i, 92, <73 

Fig. 5 Executing query Q pro f using two different subquery orderings. 


(b) <72, <7i, ® 


Consequently, AdHash leverages ObservationsQ]and 
E]to minimize communication and synchronization over¬ 
head. To see this, consider Q pro f which extends the 
query in Figure 0 with one more triple pattern, namely 
q$: (?stud, uGradFrom , ?univ). Assume Q pro f is ex¬ 
ecuted in the following order: qi, <72, <73- The query ex¬ 
ecution plan is pictorially shown in Figure | 5 (a 3 The 
results of the first join (i.e., <71 txi (72) is shown in Table 
[2 where Iprof is the pinnedsubject as demonstrated 
above. The query continues by joining the intermedi¬ 
ate result (<71 txi <72) with <73 on Istud , the subject of <73. 
Both workers projects the intermediate results on ?stud 
and hash distribute the bindings of Istud (Observation 
[[]). Then, all workers evaluate semi-joins with q 3 and 
return the candidate triples to the other workers where 
the final query results are formulated. 

Notice that the execution order <71, <72, <73 requires 
communication for evaluating both joins. Nonetheless, 
a better ordering that would potentially minimize com¬ 
munication is q-2 , <71, 93. The execution plan is shown in 
Figure [ 5 (b)| The first join (i.e., <72 txi <73) already proved 
to incur less communication by avoiding the need for 
broadcasting the entire projection column. The results 
of this join is pinned at ?stud as shown in Table [ 5 ] Since 
the join column of <73 {Istud) is the pinnedsubject, 
joining the intermediate results (<72 txi 9 i) with <73 can 
be processed locally by each worker without commu¬ 
nication using Local Hash Join (LHJ). Therefore, the 
ordering of the subqueries affects the amount of com¬ 
munication incurred during query execution. 

4 . 1.3 The four cases of a join 

Formally, joining two subqueries, say pt (possibly an in¬ 
termediate pattern) and pj, has four possible scenarios: 
the first three assume that pi and pj join on columns 
ci and C2, respectively. (*) If C2 = subject AND C2 = 
pinnedsubject, then the join can be answered by all 
workers in parallel without communication, (it) If C2 = 
subject AND C2 ^ pinnedsubject , then the join is eval¬ 


uated using DSJ; but the projected join column of pi 
is hash distributed, (m) If C2 ^ subject , then the join 
is executed using DSJ and the projected join column 
of pi is sent from all workers to all other workers. This 
includes joining on the object or predicate column. Fi¬ 
nally, ( iv) if pi and pj join on multiple columns, we opt 
to join on the subject column of pj , if it is a join at¬ 
tribute. This allows the join column of pi to be hash 
distributed as in (ii). If the subject column of pj is not 
a join attribute, we join on another column of pj and 
broadcasting the projection column to all workers, as in 
scenario (in). Verifying on the other columns is carried 
out during the join finalization by the DSJ. 

4 - 1-4 Evaluation of join orderings 

Based on the above four scenarios, we introduce our 
Locality-Aware Distributed Query Execution algorithm 
(see Algorithm [T]). The algorithm receives an order¬ 
ing of the subquery patterns. For each join iteration, 
if the second subquery joins on a subject which is the 
pinned subject, the join is executed without commu¬ 
nication (line 7 ). Otherwise, the join is evaluated with 
the DSJ algorithm (lines 9 - 28 ). In the first iteration, 
Pi is a base subquery pattern; however, for the subse¬ 
quent iterations pi is a pattern of intermediate results. 
If pi is the first subquery to be matched, each worker 
finds the local matching of pi (line 2) and projects on 
the join column c 1 (line 5 ). If the join column of <72 
is subject, then each worker hash distributes the pro¬ 
jected column (line 7 ); or sends it to all other workers 
otherwise (line 9 ). All workers perform semi-join on the 
received data (line 14 ) and send the results back to w 
(line 15 ). Finally, each worker finalizes the join (line 19 ) 
and formulates the final result (line 20 ). Lines 14 and 
19 are implemented as local hash-joins using the local 
index in each worker. The final result of a DSJ iteration 
becomes pi in the next iteration. 

Algorithm |T] can solve star queries that join on the 
subject in parallel mode. However, the planning is done 
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Algorithm 1: Locality-Aware Distributed Exe¬ 
cution 


Input: Query Q with n ordered subqueries {gi, <72, • ■ • Qn} 
Result: Answer of Q 

1 pi <- qi] 

2 pinned.subject <— p\.subject’, 

3 for i <— 2 to n do 

4 p 2 «- qi\ 

5 [ci, C 2 ]-<— getJoinColumns (pi, P 2 )', 

6 if C 2 == pinnedsubject AND C2 is subject then 

7 |_ Pi <— JoinWithoutCommunication (p 1, P2, c 1, C2); 


else 

if pi NOT intermediate pattern then 
RSi <— answerSubquery(pi); 

else 

|_ RS 1 is the result of the previous join 


13 

14 

15 

16 
17 


RSi [ci] <— tc c1 (RSi ); // projection on ci 
if C 2 is subject then 

Hash RSi [ci] among workers; 

else 

Send RSifci] to all workers; 


18 

19 

20 

21 

22 

23 


Let RS 2 <— answerSubquery (P 2 ); 
foreach worker w, w : 1 —>■ N do 

Let RSi™ [ci] denote the RSi[ci] received from 
w 

Let CRS2W be the candidate triples of RS2 
that join with l?S'i u ,[ci] 

CRS2W <— -RSlit,[ci] tXlfiS' lio [ Cl ]. Cl= HS , 2.C2 RS2] 
Send CRS2W to worker w ; 


24 

25 

26 

27 

28 


foreach worker w, w : 1 —>■ N do 

Let RS 2 W be the CRS 2 w received from worker 
w 

Let RES W be the result of joining with worker 
w 

_ RESvj <— RS 1 SX\RS 1 .c 1 =RS2 w C2 RS2w] 


pi <— RE Si U RES 2 U — U RESn ; 


by the master using global statistics. We argue that 
allowing each worker to plan the query execution au¬ 
tonomously would result in a better performance. For 
example, using the data graph in Figure |T| Table [G] 
shows triples that match the following star query: 

— q±: (?s, advisor , ?p) 

— q 2 : (?s, uGradFrom, ?u) 

Any global plan (i.e., qr dxi q 2 or q 2 tx q\) would 
require a total of four index lookups to solve the join. 
However, W\ and W 2 can evaluate the join using 2 and 
1 index lookup(s), respectively. Therefore, to solve such 
queries, the master sends the query to all workers; each 
worker utilizes its local statistics to formulate the ex¬ 
ecution plan, evaluates the query locally without com¬ 
munication, and sends the final result to the master. 


Table 6 Triples matching (?s, advisor, ?p) and (?s, 
uGradFrom, ?u) on two workers. 


Worker 1 


Worker 2 

advisor 

?s 

?p 


advisor 

?s 

? P 


Fred 

Bill 



John 

Bill 


Lisa 

Bill 






Lisa 

James 





uGradFrom 

?s 

?u 


| uGradFrom 

?s 

?u 


Lisa 

MIT 



Bill 

CMU 


James 

CMU 



John 

CMU 


ing the best subqueries ordering. We use Dynamic Pro¬ 
gramming (DP) for optimizing the plan. 

Each state S in DP is identified by a subgraph g 
of the query graph. A state can be reached by different 
orderings on g. Therefore, we maintain in each state the 
ordering that results in the least estimated communica¬ 
tion cost ( S.cost ). We also keep estimated cardinalities 
of the variables in the query. Furthermore, instead of 
maintaining the cardinality of the state, we keep the 
cumulative cardinality of all intermediate results that 
led to this state. The reason is that the cardinality of 
the state will be the same regardless of the ordering. 
However, reaching to the same state using different or¬ 
dering will result in different cumulative cardinality. 

We initialize a state S for each subquery pattern 
(subgraph of size 1) pi. S.cost is initially zero because 
a query with a single pattern can be answered with¬ 
out communication. Then, we expand the subgraph by 
joining with another pattern pj, leading to a new state 
S' such that: 

S'.cost = min(S'.cost , S.cost + cost(S,pj )) 

If we reach a state using different orderings with the 
same cost, we keep the one with the least cumulative 
cardinality. This happens for subqueries that join on 
the pinnedsubject. To minimize the DP table size, we 
maintain a global minimum cost ( minC ) of all found 
plans. Because our cost function is monotonically in¬ 
creasing, any branch that results in a cost > minC is 
pruned. Moreover, because of Observation 1, we start 
the DP process by considering subqueries connected to 
the subject with the highest number of outgoing edges. 
Considering these subqueries first increases the proba¬ 
bility of converging to the optimal plan faster. 


4.2 Locality-Aware Query Optimization 

Our locality-aware planner leverages the query struc¬ 
ture and the hash-based data distribution during query 
plan generation to minimize communication. Accord¬ 
ingly, the planner uses a cost-based optimizer for find- 


4.3 Cost Estimation 

We set the initial communication cost of DP states to 
zero. Cardinalities of subqueries with variable subjects 
and objects are already captured in the master’s global 
statistics. Hence, we set the cumulative cardinalities of 
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the initial states to the cardinalities of the subqueries 
themselves and set the size of the subject and object 
bindings to |p.s| and \p.o\. Furthermore, the master con¬ 
sults the workers to update the cardinalities of subquery 
patterns that are attached to constants or have un¬ 
bounded predicates. This is done locally at each worker 
by simple lookups to its PS- and PO- indices to update 
the cardinalities of variables bindings accordingly. 

We estimate the cost of expanding a state S with a 
subquery pj , where Cj and P are the join column and 
the predicate of pj, respectively. If the join does not 
incur communication, the cost of the new state S' is 
zero. Otherwise, the expansion is carried out through 
DSJ and we incur two phases of communication: (?) 
transmitting the projected join column and (??) reply¬ 
ing with the candidate triples. Estimating the commu¬ 
nication in the first phase depends on the cardinality 
of the join column bindings in S, denoted as B(cj). In 
the second phase, communication depends on the se¬ 
lectivity of the semi-join and the number of variables 
v in pj (constants are not communicated). Moreover, if 
Cj is the subject column of pj, we hash distribute the 
projected column. Otherwise, the column needs to be 
sent to all workers. The cost of expanding S with pj is: 


' 0 

if Cj is subject & Cj = pinned_subject 


cost(S,pj) = < 


S.B( Cj ) + (u ■ S.B(cj) ■ P p3 ) 

if Cj is subject V Cj pinned_subject 


a constant. In this case, we assume that each tuple in 
the previous state has a connection to this constant by 
setting P pCj to 1. 

5 AdHash Adaptivity 

Studies show that even minimal communication results 
in significant performance degradation [I81I22]. Thus, 
data need to be redistributed to minimize, if not elim¬ 
inate, communication and synchronization overheads. 
AdHash adapts to workload by redistributing only the 
parts of data needed for the current workload and adapts 
as the workload changes. The incremental redistribu¬ 
tion model of AdHash is a combination of hash parti¬ 
tioning and /c-hop replication; however, it is guided by 
the query load rather than the data itself. Specifically, 
given a hot pattern Q (hot patterns detection is dis¬ 
cussed in Section HTH) . our system selects a special ver¬ 
tex in the pattern called the core vertex (Section [5TJ- 
The system groups the data accessed by the pattern 
around the bindings of this core vertex. To do so, the 
system transforms the pattern into a redistribution tree 
rooted at the core (Sectioning) ■ Then, starting from the 
core vertex, first hop triples are hash distributed based 
on the core bindings. Next, triples that bind to the sec¬ 
ond level subqueries are collocated and so on (Section 
15.31) . AdHash utilizes these redistributed patterns to an¬ 
swer queries in parallel without communication. 


5.1 Core Vertex Selection 


(S.B( Cj ) ■ N) + (v ■ N ■ S.B(cj) ■ P po ) 

„ if Cj is not subject 

Next, we need to re-estimate the cardinalities of all 
variables in pj. For each variable v £ Pj, let \p.v\ de¬ 
note |p.s| or \p.o\ if v is subject or object, respectively. 
Similarly, let P pp denote \P ps \ if v is subject or |P po | if 
v is object. We re-estimate the cardinality of v in the 
new state S' as: 

min(S.B(v), |P|) if v = 1 

, min(S.B(v), Ip.ul) if v = c, & u > 1 

S -B(v) = . . , 

min(S.B(v), S.B(v) ■ P p v 

, |p.?J|) if u 5 £ Cj & v > 1 

We use the cumulative cardinality when we reach 
the same state using two different orderings. There¬ 
fore, we also re-estimate the cumulative state cardinal¬ 
ity |S"|. Let P pc . denote |P ps | or |P po | depending on the 
position of Cj, |5"| = |Sj ■ (1 + P pCj ). Notice that we 
use an upper bound estimation of the cardinalities. A 
special case of the last equation is when a subquery has 


For a hot pattern, the choice of the core has a signif¬ 
icant impact on the amount of replicated data as well 
as on query execution performance. For example, con¬ 
sider query Q i = (?stud, uGradFrom , ?univ). Assume 
there are two workers, w\ and w 2 , and refer to the graph 
of Figure [1] MIT and CMU are the bindings of luniv , 
whereas Lisa, John, James and Bill bind to ?stud. As¬ 
sume that luniv is the core, then triples matching Q\ 
will be hashed on the bindings of luniv as shown in 
Figure 6(a) Note that every binding of Istud appears 
in one worker only. Now assume that Istud is the core 
and triples are hashed using the bindings of Istud. This 
causes the binding luniv= CMU to exist on both work¬ 
ers (see Figure 6(b)). The problem becomes more pro¬ 
nounced when the query has more triple patterns. Con¬ 
sider Q2 = Q\ AND (?prof, gradFrom , ?univ) and as¬ 
sume that Istud is chosen as core. Because CMU exists 
on both workers, all its graduates will also be repli¬ 
cated (i.e., triples matching (?prof, gradFrom, CMU) 
will be replicated on both workers). Replication can 
become significant because it grows exponentially with 
the number of triple patterns m ■ 
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wl 


MIT 


CMU 


uGradFrom 


Lisa 


uGradFrom 


James 



uGradFrom 


(b) Core is Istud, 


Fig. 6 Effect of choice of core on replication. In (a) there is 
no replication. In (b) CMU is both workers. 


Intuitively, if random walks start from two random 
vertices (e.g., students), the probability of reaching the 
same well-connected vertex (e.g., university) within a 
few hops is higher than reaching the same student from 
two universities. In order to minimize replication, we 
must avoid reaching the same vertex when starting from 
the core. Hence, it is reasonable to select a well-connected 
vertex as the core. 

In the literature there are many definitions of what 
constitutes a well-connected vertex, many of which are 
based on complex data mining algorithms. In contrast, 
we employ a definition that poses minimal computa¬ 
tional overhead. We assume that connectivity is pro¬ 
portional to degree centrality (i.e., in-degree plus out- 
degree edges) of a vertex. However, many RDF datasets 
follow the power-law distribution, where few vertices 
are of extremely high degrees. For example, vertices 
that appear as objects in triples with rdf:type have very 
high degree centrality. Treating such vertices as cores 
results in imbalanced partitions and prevents the sys¬ 
tem from taking full advantage of parallelism [IB). 

Recall from Section that we maintain statistics 
ps and po for each predicate p £ P, where P is the set 
of all predicates in the data. Let P s and P 0 be the set 
of all ps and po , respectively. We filter out predicates 
with extremely high scores and consider them outliers. 
Outliers are filtered out using Chauvenet’s criterion [5] 
on P s then P a . If a predicate p is detected as an outlier, 
we set: ps = Po = — oo; else use ~ps and po as computed 
in Section EP1 Now, we can compute a score for each 
vertex in the query as follows: 

Definition 1 (Vertex score) For a query vertex v, 
let E out (v) be the set of outgoing edges and Ei n {v) be 
the set of incoming edges. Also, let A be the set of all 
ps for the E out (v) edges and all po for Ei n (v ) edges. 
The vertex score v is defined as: v = max( A). 

Figure [7] shows an example for vertex score assign¬ 
ment. For vertex ?prof, Ei n (7prof) = {advisor} and 
Eoutiflprof) = {gradFrom}. Both predicates (i.e., ad- 



Fig. 7 Example of vertex score: numbers correspond to ps 
and po values. Assigned vertex scores v are shown in bold. 


Algorithm 2: Pattern Transformation 

Input: G — {V, E }; a vertex-weighted, undirected graph, the 
core vertex v 

Result: The redistribution tree T 

1 Let edges be a priority queue of pending edges 

2 Let verts be a set of pending vertices 

3 Let core-edges be all incident edges to v 

4 visited[v' ] = true; 

5 T.root=v'; 

6 foreach e in core-edges do 

7 edges. push(i/, e.nbr, e.pred); 

8 verts, insert (e.nbr)', 

9 T. add(u, e.pred, e.nbr); 

10 while edges notEmpty do 

(parent, vertex, predicate) <— edges. popO; 
visit ed[vertex] = true; 
verts. remove( vertex ); 
foreach e in vertex.edges do 
if e.nbr NOT visited then 
if e.nbr (£ verts then 

edges .push(vertex, e.nbr, e.pred ); 
verts, insert (e.nbr); 

T .add (.vertex, e.pred, e.nbr); 

else 

T. add (vertex, e.pred, duplicate(e.nbr)); 
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visor and gradFrom) contribute a score of 5 to ?prof. 
Therefore, ? prof = 5. 

Definition 2 (Core vertex) Given a query Q, the ver¬ 
tex v' with the highest score is called the core vertex. 

In Figure^ luniv has the highest score, hence, it is the 
core vertex for this pattern. 


5.2 Generating the Redistribution Tree 

Let Q be a hot pattern that AdHash decides to redis¬ 
tribute and let Dq be the data accessed by this pattern. 
Our goal is to redistribute (partition) Dq among all 
workers such that Dq can be evaluated without com¬ 
munication. Unlike previous work that performs static 
MinCut-based partitioning m, we eliminate the edge 
cuts by replicating edges that cross partition bound¬ 
aries. Since the partitioning is an NP-complete prob¬ 
lem, we introduce a heuristic for partitioning Dq with 
two objectives in mind: (?) the redistribution of Dq 
should benefit Q as well as other pattens, (ii) Because 
replication is necessary for eliminating communication, 
redistributing Dq should result in minimal replication. 
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Fig. 8 The query in Figure Q transformed into a tree using 
Algorithm [2] Numbers near vertices define their scores. The 
shaded vertex is the core. 

To address the first objective, we transform the pat¬ 
ten Q into a tree T by breaking cycles and duplicating 
some vertices in the cycles. The reason is that cycles 
constrain the data grouped around the core to be also 
cyclic. For example, the query pattern in Figure [7] re¬ 
trieves students who share the same alma mater with 
their advisors. Grouping the data around universities 
without removing the cycle is not useful for retrieving 
professors and their advisees who do not share the same 
university. Consequently, the pattern in Figure[7]can be 
transformed into a tree by breaking the cycle and dupli¬ 
cating the Istud vertex as shown in Figure [ 8 ] We refer 
to the result of the transformation as redistribution tree. 

Our goal is to construct the redistribution tree that 
minimizes the expected amount of replication. In Sec- 
tion l5.ll we explained why starting from the vertex with 
the highest score has the potential to minimize replica¬ 
tion. Intuitively, the same idea applies recursively to 
each level of the redistribution i.e., every child node in 
the tree has a lower score than its parent. Obviously, 
this cannot be always achieved; for example in a path 
pattern where a lower score vertex comes between two 
high score vertices. Therefore, we use a greedy algo¬ 
rithm for transforming a hot pattern Q into a redistri¬ 
bution tree T. Specifically, using the scoring function 
discussed in the previous section, we first transform Q 
into a vertex weighted, undirected graph G, where each 
node has a score and the directions of edges in Q are 
disregarded. The vertex with the highest score is se¬ 
lected as the core vertex. Then, G is transformed into 
the redistribution tree using Algorithm [2] 

Algorithm [2] is a modified version of the Breadth- 
First-Search (BFS) algorithm, which has the following 
differences: (*) unlike BFS trees which span all vertices 
in the graph, our tree span all edges in the graph. Each 
of the edges in the query graph should appear exactly 
once in the tree while vertices may be duplicated. ( ii) 
During traversal, vertices with high scores are identi¬ 
fied and explored first (using a priority queue). Since 
our traversal needs to span all edges, elements in the 
priority queue are stored as edges of the form ( parent , 
vertex, predicate). These elements are ordered based 
on the vertex score first then on the edge label (pred¬ 
icate). Since the exploration does not follow the tra¬ 


Table 7 Triples from Figure ^matching patterns in Figure [8] 


Worker 1 

Worker 2 

ti 

(Lisa, uGradFrom, MIT) 

*3 

(Bill, uGradFrom, CMU) 



ti 

(James, uGradFrom, CMU) 



*5 

(John, uGradFrom, CMU) 

*2 

(James, gradFrom, MIT) 

t6 

(Bill, gradFrom, CMU) 

£7 

(Lisa, advisor, James) 

is 

(Fred, advisor, Bill) 



*9 

(John, advisor, Bill) 



1 10 

(Lisa, advisor, Bill) 


ditional BFS ordering, we maintain a pointer to the 
parent so edges can be inserted properly in the tree. As 
an example, consider the query in Figure [7] Having the 
highest score, ?univ is chosen as core, and the query is 
transformed into the tree shown in Figure [51 Note that 
the nodes have weights (scores) and the directions of 
edges have been moved back. 


5.3 Incremental Redistribution 

Incremental ReDistribution (IRD) aims at redistribut¬ 
ing data accessed by hot patterns among all workers 
in a way that eliminates communication while achiev¬ 
ing high parallelism. Given a redistribution tree, Ad- 
Hash distributes the data along paths from the root 
to leaves using depth first traversal. The algorithm has 
two phases. First, it distributes triples containing the 
core vertex to workers using hash function TL(-). Let t 
be such a triple and let t.core be its core vertex (the 
core can be either the subject or the object of t). Let 
wi,... ,u>n be the workers, t will be hash-distributed 
to worker Wj, where j = TL(t.core ) mod N. Note that 
if t.core is a subject, t will not be replicated by IRD 
because of the initial subject-based hash partitioning. 

In Figure [HJ consider the first-hop triple patterns 
(?prof, uGradFrom, ?univ) and (?stud, gradFrom, 
?univ). The core 7univ determines the placement of 
ti-te (see Table [7|. Assuming two workers, t\ and t? 
are hash-distributed to w\ (because of MIT), whereas 
t 3 ~te are hash-distributed to W 2 (because of CMU). The 
objects of triples fi-t .5 are called their source columns. 

Definition 3 (Source column) The source column of 
a triple is the column (subject or object) that determines 
its placement. 

The second phase of the IRD places triples of the 
remaining levels of the tree in the workers that contain 
their parent triples, through a series of distributed semi¬ 
joins. The column at the opposite end of the source 
column of the previous step becomes the propagating 
column; in our previous example, the propagating col¬ 
umn is the subject (i.e., 7prof). 
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Definition 4 (Propagating column) The propagat¬ 
ing column of a triple is its object (resp. subject) if the 
source column of the triple is its subject (resp. object). 

At the second level of the redistribution tree in Fig¬ 
ure El the only subquery pattern is (?stud, advisor, 
?prof). The propagating column Iprof from the previ¬ 
ous level becomes the source column for the current pat¬ 
tern. Triples f 7 ...io in Tabic [7] match the sub-query and 
are joined with triples Accordingly, is placed in 
worker W\, whereas tj, tg and tio are sent to wg. 


Algorithm 3: Incremental Redistribution 


Input: P = {!£}; a path of consecutive edges, C is the core 
vertex. 

Result: Data replicated along path P 
// hash-distributing the first (core-adjacent) edge 

1 if eo is not replicated then 

2 coreData = getTriplesOf SubQuery (eo); 

3 foreach t in coreData do 

4 I m — B(C ) mod N ; // N is the number of workers 

5 I sendToWorker (.t, m); 


// then collocate triples from other levels 
6 foreach i : 1 —>• | £71 do 
r if ei is not replicated then 

8 I candidTriples — DSJ(eo, e^); 

9 I IndexCandidateTriples {.candidTriples ); 


10 


eo = e*; 


The IRD process is formally described in Algorithm 
[3] For brevity, we describe the algorithm on a path 
input since we follow depth first traversal. The algo¬ 
rithm runs in parallel on all workers. Lines 1-5 hash dis¬ 
tribute triples that contain the core vertex C, if neces- 
saryjj Then, triples of the remaining levels are localized 
(replicated) in the workers that contain their parent. 
Replication is avoided for each triple which is already 
in the worker. This is carried out through a series of 
DSJ (lines 6-10). We maintain candidate triples in each 
level rather than final join results. Managing replicas in 
raw triple format allows us to utilize the RDF indices 
when answering queries using replicated data. 


5.4 Queryload Monitoring 

To effectively monitor workloads, systems face the fol¬ 
lowing challenges: (i) the same query pattern may occur 
with different constants, subquery orderings, and vari¬ 
able names. Therefore, queries in the workload need 
to be deterministically transformed into a representa¬ 
tion that unifies similar queries. ( ii ) This representa¬ 
tion needs to be updated incrementally with minimal 
overhead. Finally, (Hi) monitoring should be done at 

7 Recall if a core vertex is a subject, we do not redistribute. 


the level of patterns not whole queries. This allows the 
system to identify common hot patterns among queries. 

Heat map. We introduce a hierarchical heat map rep¬ 
resentation to monitor workloads. The heat map is main¬ 
tained by the redistribution controller. Each query Q is 
first decomposed into a redistribution tree using Algo¬ 
rithm [2] (i.e., the procedure described in Section & 
The result is a tree T with the core vertex as root. To 
detect overlap among queries, we transform T to a tree 
template T in which all the constants are replaced with 
variables. To avoid loosing information about constant 
bindings in the workload, we store the constants values 
and their frequencies as meta-data in the template ver¬ 
tices. After that, T is inserted in the heat map which is 
a prefix-tree like structure that includes and combines 
the tree templates of all queries. Insertion proceeds by 
traversing the heat map from the root and matching 
edges in T. If the edge does not exist, we insert a new 
edge in the heat map and set the edge count to 1; oth¬ 
erwise, we increment the edge count. Furthermore, we 
update the meta-data of vertices in the heat map with 
the meta-data in T 's vertices. For example, consider 
queries Q i, Qg and Qg and their decompositions Tj, 
T 2 and T 3 , respectively in Figure [9[a) and (b). Assume 
that each of the queries is executed once. The state of 
the heat map after executing these queries is shown in 
Figure^c). Every inserted edge updates the edge count 
and the vertex meta-data in the heat map. For example, 
edge (?z> 2 , uGradFrom, Iv 1 ) has edge count 3 because 
it appears in all T’s. Furthermore, {MIT, 1} is added 
to the meta-data of V\. 

Hot pattern detection. The redistribution controller 
monitors queries by updating the heat map. As more 
queries are executed, the controller identifies hot pat¬ 
terns from the heat map. Currently, we use a hardwired 
frequency thresholcjfl for identifying hot patterns. Once 
a hot pattern is detected, the redistribution controller 
triggers the IRD process for that pattern. Remember 
that patterns in the heat map are templates in which 
all vertices are variables. To avoid excessive replication, 
some variables are replaced by dominating constants 
stored in the heat map. For example, assume the se¬ 
lected part of the heat map in Figure |9|c) is identified 
as hot. We replace vertex ?V 3 with the constant Grad 
because it is the dominant value. On the other hand, 
?Ui is not replaced by MIT because MIT does not dom¬ 
inate other values in query instances that include the 
hot pattern. We use Boyer-Moore majority vote algo¬ 
rithm [Sj for deciding the dominating constant. 


8 Auto-tuning the frequency threshold is a subject of our 
future work. 
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(a) The queries to be answered 
Fig. 9 Updating the heat map. Selected areas indicate hot patterns. 
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(d) Replica Index 


Fig. 10 A query and the pattern index that allows execution without communication. 


5.5 Pattern and Replica Index 

Pattern index. The pattern index is created and main¬ 
tained by the replication controller at the master. It has 
the same structure as the heat map, but it only stores 
redistributed patterns. For example, Figurc ITOl' ch shows 
the pattern index state after redistributing all patterns 
in the heat map (Figure [U(c)). The pattern index is 
used by the query planner to check if a query can be 
executed without communication. When a new query Q 
is posed, the planner transforms Q into a tree T. If the 
root of T is also a root in the the pattern index and all 
of T's edges exist in the pattern index, then Q can be 
answered in parallel mode; otherwise, Q is answered in 
a distributed fashion. For example, the query in Figure 
[Tol' af can be answered in parallel because its redistri¬ 
bution tree ('Figure flOfb'l') is contained in the pattern 
index. Edges in the pattern index are time-stamped at 
every access to facilitate our eviction policy. 

Replica index. The replica index at each worker is 
identical to the pattern index at the master and is also 
updated by the IRD process. However, each edge in 
the replica index is associated with a storage module 
similar to the one that stores the original data. Each 
module stores only the replicated data of the specified 
triple pattern. In other words, we do not add the repli¬ 
cated data to the main indices nor keep all replicated 
data in a single index. There are four reasons for this 
segregation, (i) As more patterns are redistributed, up¬ 
dating a single index becomes a bottleneck, (ii) Because 
of replication, using one index mandates filtering dupli¬ 
cate results. (Hi) If data is coupled in a single index, 
intermediate join results will be larger, which will af¬ 


fect performance. Finally, (iv) this hierarchical repre¬ 
sentation allows us to evict any part of the replicated 
data quickly without affecting the overall system perfor¬ 
mance. Notice that we do not replicate data associated 
with triple patterns whose subjects are core vertices. 
Such data are accessed from the main index directly 
because of the initial subject-based hash partitioning. 
FigureHOld) shows the replica index that has the same 
structure as the pattern index in Figure fTUT ch The stor¬ 
age module associated with (?v7, member , ?v6) stores 
replicated triples that match the triple pattern. More¬ 
over, these triples qualify for the join with the triple 
patern of the parent edge. 

Conflicting Replication and Eviction. Conflicts arise 
when a subquery appears at different levels in the pat¬ 
tern index. This may cause some triples to be replicated 
by the hot patterns that include them. In terms of cor¬ 
rectness, this is not a problem for AdHash as conflict¬ 
ing triples (if any) are stored separately using differ¬ 
ent storage modules. This approach avoids the burden 
of any housekeeping and management of duplicates at 
the cost of memory consumption. Nevertheless, AdHash 
employs an LRU eviction policy that keeps the system 
within a given replication budget at each worker. 

6 Experimental Evaluation 

We evaluate AdHash against existing systems. We also 
include a non-adaptive version of our system, referred 
to as AdHash-NA, which does not include the features 
described in Section^ In Section ItuTl we provide the de¬ 
tails of the data, the hardware setup, and the competi¬ 
tors to our approach. In Section 16.21 we demonstrate 
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Table 8 Datasets Statistics in millions (M) 


Dataset 

Triples (M) 

#S (M) 

#0 (M) 

#sno (M) 

#p 

Indegree (Avg/StDev) 

Outdegree (Avg/StDev) 

LUBM-10240 

1,366.71 

222.21 

165.29 

51.00 

18 

16.54/26000.00 

12.30/5.97 

WatDiv 

109.23 

5.21 

17.93 

4.72 

86 

22.49/960.44 

42.20/89.25 

WatDiv-IB 

1,092.16 

52.12 

179.09 

46.95 

86 

23.69/2783.40 

41.91/89.05 

YAG02 

295.85 

10.12 

52.34 

1.77 

98 

10.87/5925.90 

56.20/71.96 

Bio2RDF 

4,644.44 

552.08 

1,075.58 

491.73 

1,714 

8.64/21110.00 

16.83/195.44 


the low startup and initial replication overhead of Ad- 
Hash compared to all other systems. Then, in Section 
IQ1 we apply queries with different complexities on dif¬ 
ferent datasets to show that ( i ) AdHash leverages the 
subject-based hash locality to achieve better or similar 
performance compared to other systems and ( ii ) the 
adaptivity feature of AdHash renders it several orders 
of magnitude faster than other systems. In Section HOI 
we conduct a detailed study of the effect and cost of 
AdHash’s adaptivity feature. The results show that our 
system adapts incrementally to workload changes with 
minimal overhead without resorting to full data repar¬ 
titioning. Finally, in Section 1^751 we study the data and 
machine scalability of AdHash. 


6.1 Setup and Competitors 

Datasets: We conducted our experiments using real 
and synthetic datasets of variable sizes. Table [8] de¬ 
scribes these datasets, where #S, #P, and #0 denote 
respectively the numbers of unique subjects, predicates, 
and objects. We use the synthetic LUBA@ data genera¬ 
tor to create a dataset of 10,240 universities consisting 
of 1.36 billion triples. WatDiJ^I is a recent benchmark 
that provides a wide spectrum of queries with vary¬ 
ing structural characteristics and selectivity classes. We 
mainly used two versions of this dataset: WatDiv (109 
million) and WatDiv-lB (1 billion) triples. LUBM and 
its template queries are usually used by most distributed 
RDF engines (TM221H2M5T] for testing their query eval¬ 
uation performance. However, LUBM queries are in¬ 
tended for semantic inferencing and their complexities 
lie in semantics not structure. Therefore, we also use 
WatDiv dataset which provides a wide range of query 
complexities and selectivity classes. As both LUBM and 
WatDiv are synthetic, we also use two real datasets. 
yago£3 is a real dataset derived from Wikipedia, 
WordNet and GeoNames containing 300 million triples. 
Bio2RDF dataset provides linked data for life sciences 
using semantic web technologies. We use Bio2RDl{3 

9 http://swat.cse.lehigh.edu/projects /lubm/ 

10 http://db.uwaterloo.ca/watdiv/ 

11 http://yago-knowledge.org/ 

12 http://download.bio2rdf.Org/release/2/ 


release 2, which contains 4.64 billion triples connecting 
24 different biological datasets. 

Hardware Setup: We implemented AdHash in C++ 
and used a Message Passing Interface library (MPICH2) 
for synchronization and communication. Unless other¬ 
wise stated, we deploy AdHash and its competitors on 
a cluster of 12 machines each with 148GB RAM and 
two 2.1GHz AMD Opteron 6172 CPUs (12 cores each). 
The machines run 64-bit 3.2.0-38 Linux Kernel and are 
connected by a lOGbps Ethernet switch. 
Competitors: We compare our framework against 
two recent in-memory RDF systems, Trinity.RDF m 
and TriAD [15j. To the best of our knowledge, these 
systems provide the fastest query response times. How¬ 
ever, they were not available to us for comparison; the 
only way to compare against them is to use the re¬ 
ported runtimes in the corresponding papers mm- 
Note that our testbed is slightly inferior to those used 
in mm- In particular, Trinity.RDF uses 40Gbps In¬ 
finiBand interconnect, which is theoretically 4X faster 
than ours. TriAD uses faster processors with a larger 
number of cores interconnected with a slower intercon¬ 
nect (lGbps Ethernet). Nonetheless, because of its so¬ 
phisticated partitioning scheme and join-ahead prun¬ 
ing, TriAD communicates small amounts of data during 
query evaluation (tens of Megabytes). Therefore, using 
a faster interconnect is not going to affect its perfor¬ 
mance significantly on the datasets they used. 

We also compare with two Hadoop-based systems 
that employ lightweight partitioning: SHARD m and 
H2RDF+ [223- Furthermore, we compare to SHAPE I* 3 ! 
[22], a system that relies on static replication and uses 
RDF-3X as underlying data store. We limit our com¬ 
parison to distributed systems only, because they out¬ 
perform state-of-the-art centralized RDF systems. 

6.2 Startup Time and Initial Replication 

Our first experiment measures the time it takes all sys¬ 
tems for preparing the data prior to answering queries. 
We exclude the string-to-id mapping time for all sys¬ 
tems. For TriAD, we show the time to partition the 
graph using METIS m- We used the same number of 

13 SHAPE showed better replication and query performance 
than H-RDF-3X |18[ . Hence, we only compare to SHAPE. 
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Table 9 Preprocessing time (minutes) 



LUBM-10240 

WatDiv 

Bio2RDF 

YAG02 

AdHash 

14 

1.2 

115 

4 

METIS 

523 

66 

4,532 

105 

SHAPE 

263 

79 

>24h 

251 

SHARD 

72 

9 

143 

17 

H2RDF+ 

152 

9 

387 

22 


partitions reported in m for partitioning LUBM-10240 
and WatDiv. The Bio2RDF and YAG02 datasets are 
partitioned into 200K and 38K partitions, respectively. 
As Table|H]shows, METIS is prohibitively expensive and 
does not scale for large RDF graphs. To apply METIS, 
we had to remove all triples connected to literals; oth¬ 
erwise, METIS takes several days to partition LUBM- 
10240, Bio2RDF and YAG02 datasets. 

We configured SHAPE with full level semantic hash 
partitioning and enabled the type optimization (see I22| 
for details). Furthermore, for fair comparison, SHAPE 
is configured to partition each dataset such that all 
its queries are processable without communication. For 
LUBM-10240, SHAPE incurs less preprocessing time 
compared to METIS-based systems. However, for Wat¬ 
Div and YAG02, SHAPE performs worse because of 
data imbalance, causing some of the RDF-3X engines 
to take more time in building the databases. Particu¬ 
larly, partitioning YAG02 and WatDiv using 2-hop for¬ 
ward and 3-hop undirected, respectively, placed all the 
data in a single partition. The reason of this behavior 
is that all these datasets have uniform URI’s and hence 
SHAPE could not fully utilize its semantic hash parti¬ 
tioning. SHAPE did not finish the partitioning phase 
of Bio2RDF and was terminated after 24 hours. 

SHARD and H2RDF+ employ lightweight parti¬ 
tioning, random and range-based, respectively. There¬ 
fore, they require less time compared to other systems. 
However, since they are Hadoop-based, they suffer from 
the overhead of storing the data first on Hadoop File 
System (HDFS) before building their data stores. 

AdHasli uses lightweight hash partitioning and avoids 
the upfront cost of sophisticated partitioning schemes. 
As Table O shows, AdHash starts 4X up to two orders 
of magnitude faster than existing systems. 

We only report the initial replication of SHAPE and 
TriAD, since AdHash, SHARD and H2RDF+ do not 
incur any initial replication (the replication caused by 
AdHash’s adaptivity is evaluated in the next section). 
TriAD replicates all edges that cross partitions bound¬ 
aries; producing a 1-hop undirected guarantee. There¬ 
fore, we consider the edge-cut reported by METIS to 
be the amount of replication in TriAD. Table ITU1 shows 
the replication ratio as a percentage of the original 
data size. For LUBM-10240, TriAD results in the least 


Table 10 Initial replication 



LUBM-10240 

WatDiv 

Bio2RDF 

YAG02 

SHAPE 

42.9% 

(1 worker) 0% 

NA 

(1 worker) 0 

TriAD 

23.6% 

82.9% 

30.0% 

40.0% 


Table 11 

Query runtimes for LUBM-10240 (ms) 


LUBM-10240 

LI 

L2 

L3 

L4 

L5 

L6 

L7 

AdHash 

317 

120 

6 

1 

1 

4 

220 

AdHash-NA 

2,743 

120 

320 

1 

1 

40 

3,203 

SHAPE 

25,319 

4,387 

25,360 

1,603 

1,574 

1,567 

15,026 

H2RDF-1- 

285,430 

71,720 

264,780 

24,120 

4,760 

22,910 

180,320 

SHARD 

413,720 

187,310 

aborted 

358,200 

116,620 

209,800 

469,340 

TriAD-SG 

2,146 

2,025 

1,647 

1 

1 

1 

16,863 

Trinity.RDF 

7,000 

3,500 

6,000 

4 

3 

10 

27,500 


replication as LUBM is uniformly structured around 
universities. With full level semantic hash partitioning 
and type optimization, SHAPE incurs almost double 
the replication of TriAd. For WatDiv, METIS produces 
very bad partitioning because of the dense nature of the 
data. Consequently, TriAD results in excessive replica¬ 
tion because of the high edge-cut. Note that the high¬ 
est radius in all WatDiv query templates is 3 (undi¬ 
rected); and partitioning the whole data blindly us¬ 
ing fc-hop guarantee as in H-RDF-3X m will result 
in excessive replication which grows exponentially as 
k increases. The same thing applies to Bio2RDF and 
YAG02 datasets. SHAPE places the data on a single 
partition because of the URI’s uniformity of WatDiv 
and YAG02. Therefore, it incurs no replication but per¬ 
forms as good as a single machine RDF-3X store. 

6.3 Query Performance 

In this section, we compare AdHash performance on 
individual queries against state-of-the-art distributed 
RDF systems using multiple real and synthetic datasets. 
We demonstrate that even the AdHash-NA version of 
our system (which does not include the adaptivity fea¬ 
ture) is competitive in performance to systems that em¬ 
ploy sophisticated partitioning techniques. This shows 
that the subject-based hash partitioning and the dis¬ 
tributed evaluation techniques proposed in Section [I] 
are very effective. When AdHash adapts, its perfor¬ 
mance becomes even better and our system consistently 
outperforms its competitors by a wide margin. 

LUBM dataset: In the first experiment (Table ITTl) . 
we compare the performance of all systems using the 
LUBM-10240 dataset and queries L1-L7 defined in [2] 
and also used by Trinity.RDF and TriAD0 For SHAPE 
to execute all these queries without communication, we 
use 2-hop forward semantic hash partitioning with the 

14 Recall from Section [6. II that the numbers for Trinity.RDF 
| 37 l and TriAD [15| are copied from their corresponding pa¬ 
pers because these systems are not publicly available. There¬ 
fore, we only compare to them using the queries they used. 
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type optimization. Queries can be classified based on 
their structure and selectivities into simple and com¬ 
plex. L4 and L5 are simple selective star queries whereas 
L2 is a simple yet non-selective star query that gen¬ 
erates large final results. L6 is considered as a simple 
query because it is highly selective. On the other hand, 
LI, L3 and L7 are complex queries that generate large 
intermediate results but return very small final results. 

SHARD and H2RDF+ suffer from the expensive 
overhead of MapReduce; hence, their performance is 
significantly worse than all other systems. On the other 
hand, SHAPE incurs minimal communication and per¬ 
forms better than SHARD and H2RDF + due to the uti¬ 
lization of semantic hash partitioning. Because it uses 
MapReduce for dispatching queries to workers, it still 
suffers from the non-negligible overhead of MapReduce. 

In-memory RDF engines, Trinity.RDF, TriAD-SG 
and AdHash, perform significantly better than systems 
based on MapReduce. Queries L4 and L5 are selec¬ 
tive subject star queries that produce very small in¬ 
termediate results. Therefore, in-memory systems can 
solve these queries efficiently. AdHash exploits the ini¬ 
tial hash distribution and solve these queries without 
communication, which explains why both versions of 
AdHash have the same performance. Similarly, L2 con¬ 
sists of a single subject-subject join; however, AdHash 
is faster than TriAD-SG and Trinity.RDF by more than 
an order of magnitude. Due to L2 low selectivity, the 
exploration of Trinity.RDF does not reduce the inter¬ 
mediate results size leading to an expensive centralized 
join by the master. TriAD, on the other hand, solves 
the query by two distributed index scans (one for each 
base subquery) followed by a distributed merge join. 
AdHash performs better than TriAD-SG by avoiding 
unnecessary scans. In other words, utilizing its hash in¬ 
dexes and the right deep tree planning, AdHash requires 
a single scan followed by hash lookups. 

TriAD’s pruning technique eliminates the communi¬ 
cation required for solving L6. Therefore, it significantly 
outperforms Trinity.RDF and AdHash-NA. However, 
once AdHash adapts, L6 is executed without communi¬ 
cation resulting in a comparable performance to TriAD. 

AdHash outperforms Trinity.RDF and TriAD-SG 
for LI, L3 and L7. Even with simple hash partition¬ 
ing, AdHash-NA achieves better or comparable perfor¬ 
mance to both in-memory systems. Particularly, since 
these queries are cyclic, Trinity.RDF can not reduce 
the size of the intermediate results of these queries. 
All workers need to ship their intermediate results to 
the master to finalize the query evaluation in a central¬ 
ized manner. Therefore, the master node is a potential 
bottleneck especially when the intermediate results are 
huge, like in L7. Even with the sophisticated partition- 


Table 12 Query runtimes for WatDiv (ms) 


WatDiv-100 

Machines 

L1-L5 

S1-S7 

F1-F5 

C1-C3 

AdHash 

5 

2 

1 

10 

12 

AdHash-NA 

5 

9 

6 

235 

123 

SHAPE 

12 

1,870 

1,824 

1,836 

2,723 

H2RDF+ 

12 

5,441 

8,679 

18,457 

65,786 

TriAD 

5 

2 

3 

29 

270 


Table 13 Query runtimes for YAG02 (ms) 


YAG02 

Y1 

Y2 

Y3 

Y4 

AdHash 

2.5 

19 

11 

2 

AdHash-NA 

19 

46 

570 

77 

SHAPE 

1,824 

665,514 

1,823 

1,871 

H2RDF+ 

10,962 

12,349 

43,868 

35,517 

SHARD 

238,861 

238,861 

aborted 

aborted 


ing and pruning technique of TriAD-SG, these queries 
still require inter-worker communication whereas Ad¬ 
Hash executes these queries in parallel without com¬ 
munication. For L3, AdHash-NA is 5X to several orders 
of magnitude faster than TriAD-SG and Trinity.RDF. 
AdHash-NA evaluates the join that leads to an empty 
intermediate results early causing AdHash-NA to avoid 
useless joins. However, the first few joins cannot be 
eliminated during query planning time. On the other 
hand, AdHash can detect queries with empty results 
during planning. As each worker makes its local paral¬ 
lel query plan, workers detects that the cardinality of 
the subquery in the replica index is zero and terminates. 

WatDiv dataset: The WatDiv benchmark defines 20 
query templates^ classified into four categories: linear 
(L), star (S), snowflake (F) and complex queries (C). 
Similar to TriAD, we generated 20 queries using the 
WatDiv query generator for each query category C, F, L 
and S. We deployed AdHash on five machines to match 
the setting of TriAD in m- Table [T2l shows the perfor¬ 
mance of AdHash compared to other systems. For each 
complexity family, we calculate the geometric mean of 
each system. H2RDF+ performs worse than all other 
systems due to the overhead of MapReduce. SHAPE, 
under 2-hop forward partitioning, placed all the data 
in one machine; therefore, its performance is no better 
than a single-machine RDF-3X. AdHash and TriAD, 
on the other hand, provide significantly better perfor¬ 
mance than MapReduce-based systems. TriAD benefits 
from its asynchronous message passing and performs 
better than AdHash-NA in L, S, and F queries. For 
complex queries with large diameters AdHash-NA per¬ 
forms better as a result of the locality awareness. When 
AdHash adapts, it consistently performs better than all 
systems for all queries. 


15 http://db.uwaterloo.ca/watdiv/basic-testing.shtml 
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Table 14 Query runtimes for Bio2RDF (ms) 


Bio2RDF 

B1 

B2 

B3 

B4 

B5 

AdHash 

4 

2 

2 

2 

1 

AdHash-NA 

19 

16 

36 

187 

1 

H2RDF+ 

5,580 

12,710 

322,300 

7,960 

4,280 

SHARD 

239,350 

309,440 

512,850 

787,100 

112,280 



(a) Execution Time 



(b) Communication Cost 

Fig. 11 Impact of locality awareness on LUBM-10240. 

YAGO dataset YAG02 does not provide benchmark 
queries, therefore we created a set of representative test 
queries (Y1-Y4) defined in Appendix [C] We show in 
Table [13] the performance of AdHash against SHAPE, 
H2RDF+ and SHARD. AdHash-NA continues to sig¬ 
nificantly outperform other systems for all queries. Fur¬ 
thermore, our adaptive version, AdHash, is up to two 
orders of magnitude faster than all other systems. 

Bio2RDF dataset: Similar to YAG02 dataset, the 
Bio2RDF dataset does not have benchmark queries; 
therefore, we defined five queries (B1-B5) that have dif¬ 
ferent structures and complexities. B1 requires object- 
object join which contradicts our initial data distribu¬ 
tion. Queries B2, B3 are star queries with different num¬ 
ber of triple patterns that require subject-object and/or 
subject-subject joins. B5 is a simple star query with 
only one triple pattern while B4 is a complex query with 
2-hops radius. We could not run SHAPE as it failed 
to preprocess the data using 2-hop forward partition¬ 
ing within reasonable time. Similar to their behavior 
in LUBM-10240 and WatDiv datasets, H2RDF+ and 
SHARD still are worse than AdHash due to the MapRe¬ 
duce overhead. Overall, AdHash outperforms all other 
systems by orders of magnitude. 


6.3.1 Impact of Locality Awareness 

In this experiment, we show the effect of locality aware 
planning on the distributed query evaluation of AdHash- 
NA (non-adaptive). We define three configurations of 
AdHash: (?) We disable the pinnedsubj ect optimiza¬ 
tion and hash locality awareness, (ii) We disable the 
pinnedsubj ect optimization while maintaining the hash 
locality awareness; in other words, workers can still 
know the locality of subject vertices but joins on the 
pinned subjects are synchronized. Finally, (in) we en¬ 
able all optimizations. We run the LUBM (L1-L7) queries 
on the LUBM-10240 dataset on all configurations of 
AdHash-NA. The query response times and the com¬ 
munication costs are shown in Figures [l l(a)| and |ll(b)[ 
respectively. Disabling hash locality resulted in exces¬ 
sive communication which drastically affected the query 
response times. Enabling the hash locality affected all 
queries except L6 because of its high selectivity. The 
performance gain for other queries ranges from 6X up 
to 2 orders of magnitude. In the third configuration, 
the pinned subject optimization does not affect the 
amount of communication because of the hash local¬ 
ity awareness. In other words, since the joining subject 
is local, AdHash does not communicate intermediate 
results. However, performance is affected by the syn¬ 
chronization overhead. Queries like L2, L4 and L5 are 
not affected by this optimization because they are star 
queries joining on the subject. On the other hand, all 
queries that require communication are affected. The 
performance gain ranges from 26% in case of L6 to more 
than 90% for L3. The same behavior is also noticed in 
the WatDiv-lB dataset. 

6.4 Workload Adaptivity by AdHash 

In this section, we thoroughly evaluate AdHasli’s adap¬ 
tivity. For this purpose, we define different workloads 
on two billion scale datasets with different characteris¬ 
tics, namely, WatDiv-lB and LUBM-10240. 
WatDiv-lB workload: The WatDiv benchmark de¬ 
fines 20 query template^ classified into four categories: 
linear (L), star (S), snowflake (F) and complex queries 
(C). We used the benchmark query generator to create 
a 5K-queries workload from each category, resulting in 
a total of 20K queries. We also generated a random 
workload of 20K queries from all query templates. 
LUBM-10240 workload: As AdHash and the other 
systems do not support inferencing, we used all 14 queries 
in the LUBM benchmark without reasonin J^l . All queries 

16 http://db.uwaterloo.ca/watdiv/basic-testing.shtml 

17 Only query patterns are used. Classes and properties are 
fixed so queries return large intermediate results. 
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Frequency Threshold 


Frequency Threshold 


Frequency Threshold 


(a) Execution time 


(b) Communication cost 


(c) Replication cost 


Fig. 12 Frequency threshold sensitivity analysis. 


are listed in Appendix [A] From these queries, we gen¬ 
erated 10K queries that have different constants. Then, 
we randomly selected 20K queries from the 10K queries. 
This workload covers a wide spectrum of query com¬ 
plexities including simple selective queries, star queries 
as well as queries with complex structures and low se- 
lectivities. For details, refer to Appendix [B] 


6-4-1 Frequency Threshold Sensitivity Analysis 



The frequency threshold controls the triggering of the 
Incremental ReDistribution (IRD) process. Consequently, 
it influences the execution time and the amount of com¬ 
munication and replication in the system. In this ex¬ 
periment, we conduct an empirical sensitivity analysis 
to select the frequency threshold value based on the 
two aforementioned query workloads. We execute each 
of the workloads while varying the frequency threshold 
values from 1 to 30. Note that our frequency monitor¬ 
ing is not on a query-by-query basis as our heat map 
monitors the frequency of the subquery pattern in a hi¬ 
erarchical manner (see Section lAT) . The workload exe¬ 
cution times, the communication costs and the resulting 
replication ratios are shown in Figures 12(a) 12(b) and 
12 (c)| respectively. 


We observe that LUBM-10240 is very sensitive to 
slight changes in the frequency threshold because of the 
complexity of its queries. As the frequency threshold 
increases, the redistribution of hot patterns is delayed 
causing more queries to be executed with communica¬ 
tion. Consequently, the amount of communication and 
synchronization overhead in the system increases, af¬ 
fecting the overall execution time, while the replication 
ratio is low because fewer patterns are redistributed. 

On the other hand, WatDiv-lB is not as sensitive 
to this range of frequency threshold because most of 
its queries are solved in subseconds using our locality- 
aware distributed semi-join algorithm; and do not in¬ 
cur excessive communication. Nevertheless, as the fre¬ 
quency threshold increases, the synchronization over- 
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(b) Communication cost 

Fig. 13 AdHash adapting to workload (WatDiv-lB). 



head affects the overall execution time. Furthermore, 
due to our fine-grained query monitoring, AdHash cap¬ 
tures the commonalities between the WatDiv-lB query 
templates for frequency thresholds 5 to 30. Hence, for 
all these thresholds the replication ratio remains almost 
the same. The difference is that the system will converge 
faster for lower threshold values, reducing the overall 
execution time. In all subsequent experiments, we use 
a frequency threshold of 10 as it resulted in a good bal¬ 
ance between time and replication. We plan to study 
the auto-tuning of this parameter in the future. 


6-4-2 Workload Execution Cost 

To simulate a change in the workload, queries of the 
same WatDiv-lB template are run consecutively while 
enforcing a replication threshold of 20% per worker. 
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Fig. 14 AdHash adapting to workload (LUBM-10240). 


Fig. 15 Comparison with static representative workload- 
based partitioning. 


Figure [l3(a)| shows the cumulative time as the execu¬ 
tion progresses with and without the adaptivity feature. 
After every sequence of 5K query executions, the type 
of queries changes. Without adaptivity (i.e., AdHash- 
NA), the cumulative time increases sharply as long as 
complex queries are executed (e.g., from query 2K to 
query 10K). On the other hand, AdHash adapts to the 
change in workload with little overhead causing the cu¬ 
mulative time to drop significantly by almost 6 times. 

Figure [l3(b)| shows the cumulative communication 
costs of both AdHash and AdHash-NA. As we can see, 
the communication cost exhibits the same pattern as 
that of the runtime cost (Figure [l3(a)| ), which proves 
that communication and synchronization overheads are 
detrimental to the total query response time. The over¬ 
all communication cost of AdHash is more than 7X 
lower compared to that of AdHash-NA. Once AdHash 
starts adapting, most of future queries are solved with 
minimum or no communication. The same behavior is 
observed for the LUBM-10240 workload (see Figures 
|14(a)| and |14(b)| ) . 

Partitioning based on a representative workload: 

We tried to partition these datasets based on a rep¬ 
resentative workload using Partout m • However, it 
could not partition the data using a large workload 
within a reasonable time (<24 hours). Consequently, 
in this experiment, we simulate the effect of assuming 
a representative workload when partitioning the data 
using AdHash. To do so, we train AdHash using differ¬ 
ent combinations of the different workload categories 


defined by WatDiv-lB (C, F, S, and L). Each combi¬ 
nation is made of two categories; effectively producing 
six combinations, mainly CF, CL, CS, FL, FS, and LS. 
After training AdHash, we test the system using a ran¬ 
dom workload selected from all query categories, which 
consists of 20K queries. This way, some of the queries 
in the test workload would run in parallel while others 
(not in the representative workload) would require com¬ 
munication. In Figures 15(a) and 15(b)| we show the 
cumulative execution time and communication, respec¬ 
tively, for the test workloads (i.e. excluding the training 
time). In the same figures, we show the performance of 
AdHash without training. Obviously, the performance 
of the test workload highly depends on the complex¬ 
ity of the queries used in the training phase. For ex¬ 
ample, the complex (C) and snowflake (F) queries are 
the most expensive queries in the benchmark. There¬ 
fore, when the system is trained using the CF training 
workload, it performs much better than when trained 
using the LS workload. On the other hand, allowing the 
system to adapt incrementally and dynamically (with¬ 
out training) resulted in better performance when com¬ 
pared to all cases. AdHash incurs more communication 
at the beginning because of the IRD process. However, 
it then converges to almost constant communication. 
CF workload requires less communication because the 
L and S queries (not in the training workload) do not 
require excessive data exchange. Nonetheless, the CF 
execution time keeps increasing due to the existence of 
communication and synchronization overheads. 
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6-4-3 Redistribution Tree Generation 

In this experiment, we evaluate our query transforma¬ 
tion heuristic fSection l5.ll) against other two alternative 
approaches. Recall that when transforming a hot query 
pattern into the redistribution tree, we select the ver¬ 
tex with the highest score to be the tree root. Then the 
query is traversed from high score vertices to lower score 
ones. Therefore, we compare our heuristic (referred to 
High-Low hereafter) to two different heuristics: (?) the 
vertex with the least vertex score is selected as core; 
then the query pattern is traversed be exploring ver¬ 
tices with lower scores first. We refer to this heuristic 
as Low-High. We also compare to (??) another approach 
that uses a different vertex scoring function where the 
score of a vertex in the hot query pattern is its out- 
degree. The pattern is then traversed from high score 
vertices to lower score ones. We refer to this approach 
as QDegree. Note that the latter approach aims at min¬ 
imizing the replication in a greedy manner by fully ex¬ 
ploiting the initial hash partitioning. Recall that data 
that binds to triple patterns whose subject is a core are 
not replicated. 

We evaluated all these heuristics by running the 
LUBM-10240 workload. In Figure |16(a)| we show the 
resulting replication, the communication cost and the 
amount of data touched by the IRD process. The Low- 
High and the QDegree heuristics resulted in slightly less 
replication compared to the High-Low approach. The 
reason is that both heuristics benefit from the initial 
hash partitioning by selecting cores with larger num¬ 
ber of outgoing edges. However, the amount of data 
touched by the redistribution process (i.e. data in the 
main and replica indices) in the Low-High and QDegree 
is significantly higher. This affects the adaptivity per¬ 
formance because the IRD process is carried out using a 
series of DSJ iterations. Furthermore, because the data 
touched by the process is actually used for evaluating 
parallel queries, the performance of parallel queries is 
eventually affected. 

Consequently, the cumulative workload execution 
time using the High-Low heuristic is 1.9X faster than 
the other heuristics as shown in Figure [T6(b)| Since the 
QDegree and Low-High heuristics touch and communi¬ 
cate almost the same amount of data, their cumulative 
execution times are also the same. Besides, note that 
the QDegree heuristic does not use any statistical in¬ 
formation from the data and only relies on the struc¬ 
ture of the hot query pattern. Therefore, a redistributed 
pattern would not benefit other future queries with a 
slightly different structure. We repeated the experiment 
on WatDiv-lB and all heuristics resulted in almost the 
same communication cost, wall time, and touched data. 
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Fig. 16 Effect of hot pattern transformation. 


This time, QDegree resulted in the least replication be¬ 
cause its exploits best the initial subject-based hash 
partitioning. 


Table 15 Load Balancing in AdHash 


Dataset 

Max 

Percentage of triples 

Min Average StDev (a) 

Replication 

Ratio 

LUBM-10240 

WatDiv-lB 

1.43% 

1.58% 

1.35% 

1 .20% 

1.39% 

1.33% 

0.02 

0.07 

0.73 

0.36 


6-4-4 Replication and Load Balancing 

In this experiment, we evaluate the load balancing of 
AdHash from two different perspectives: (?) data bal¬ 
ancing , in which we consider how balanced is the initial 
partitioning as well as the replication that results from 
the IRD process; ii work balancing , in which we consider 
how the evaluation cost is balanced among all workers 
in the system, during the execution of the workload. 
In Table [15j we report some statistics that character¬ 
ize the data load balance in AdHash. Particularly, we 
report the average and standard deviation (a) of the 
percentage of triples stored at each worker. As shown 
in the table, AdHash achieves a very good data balance 
for both workloads because of the initial subject-based 
hash partitioning as well as the hashing used during the 
IRD process. As a result of the data balance, work is 
also well balanced among workers; i.e., the amount of 
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Fig. 18 AdHash scalability using LUBM dataset. 
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Fig. 17 Workload balance. 


work contributed by each worker in the system is al¬ 
most the same as shown in Figures [l 7(a) | and |1 7(b) | for 
the LUBM-10240 and WatDiv-lB, respectively. 


6.5 Scalability 


Data Scalability We use LUBM benchmark data gen¬ 
erator to generate six datasets of different sizes: LUBM- 
160, LUBM-320, LUBM-640, LUBM-1280, LUBM-2560 
and LUBM-5120. We keep the number of workers fixed 
to 72 (6 workers per machine). Figures 18(a) and 118(b)] 


shows the data scalability of AdHash and AdHash-NA 
for simple and complex queries respectively. L4, L5, L6 
are simple queries that are very selective and touch the 
same amount of data regardless of the data size. This 
describes the steady performance of both AdHash and 


AdHash-NA for these queries. Because L2 is not selec¬ 
tive and returns massive final results, it is inevitable 
for its scalability to degrade as data size increases. Fig¬ 
ure |18(b)| shows the scalability of AdHash for complex 
queries. Queries LI and L7 generate large number of 
intermediate results causing high communication cost, 
which explains their poor scalability of AdHash-NA. 
Nevertheless, as AdHash adapts to the workload, many 
queries are evaluated in parallel mode much faster. 

Strong Scalability Using LUBM-10240, we fixed the 
workload and increased the number of workers. Due 
to the adaptivity of AdHash, communication is 
mized leading to nearly optimal scalability. Figure 
shows the scalability of parallel queries as we increase 
the number of workers. 

7 Conclusion 

In this paper, we presented AdHash, an adaptive dis¬ 
tributed RDF engine. Using lightweight partitioning 
that hashes triples on the subjects, AdHash exploits 
query structures and the hash-based data locality in or¬ 
der to minimize the communication cost during query 
evaluation. Furthermore, AdHash monitors the query 
workload and incrementally redistributes parts of the 
data that are frequently accessed by hot patterns. By 
maintaining and indexing these patterns, many future 
queries are evaluated without communication. The adap¬ 
tivity feature of AdHash complements its excellent per¬ 
formance on queries that can benefit from its hash- 
based data locality; i.e., frequent query patterns that 
are not favored by the partitioning (e.g., like star joins 
on an object) can be processed in parallel due to the 
AdHash’s adaptivity. 

Our experimental results verify that AdHash achieves 
better partitioning and replicates less data than its com¬ 
petitors. More importantly, AdHash scales to very large 
RDF graphs and consistently provides superior perfor¬ 
mance by adapting to dynamically changing workloads. 
Currently, we are investigating the possibility of utiliz- 
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ing AdHash for general (i.e., non-RDF) graphs, and op¬ 
erators such as graph traversals, or reachability queries. 
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A LUBM Benchmark Queries 

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX ub: <http://www.lehigh.edu/ zhp2/2004/0401/univ- 
bench.owl#> 

PREFIX rdfs: <http://www.w3.Org/2000/01/rdf-schema#> 
PREFIX y: <http://yago-knowledge.org/resource/> 
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Ql: SELECT ?X WHERE{ ?X rdfitype ub:GraduateStudent 
. ?Xub:takesCourse<http://www.Department!).UniversityO. 
edu/GraduateCourseO>. } 

Q2: SELECT ?X ?Y ?Z WHERE{ ?X rdf:type ubiGraduateSt 
udent . ?Y rdf:type ub:University . ?Z rdf:type ub:Department 
. ?X ub:memberOf ?Z . ?Z ub:subOrganizationOf ?Y . ?X ub 
:undergraduateDegreeFrom ?Y . } 

Q3: SELECT ?X WHERE{ ?X rdfrtype ub:Publication . ?X 
ub:publicationAuthor <http://www.DepartmentO.Universit 
yO.edu/AssistantProfessorO> . } 

Q4: SELECT ?X, ?Y1, ?Y2, ?Y3 WHERE ?X rdf:type ub:As 
sociateProfessor . ?X ub:worksFor <http://www.Departmen 
t0.Lfniversity0.edu> . ?X ub:name ?Y1 . ?X ub:emailAddress 
?Y2 . ?X ub:telephone ?Y3 . } 

Q5: SELECT ?X WHERE{ ?X rdfrtype ub:Undergraduates 
tudent . ?X ub:memberOf <http://www.DepartmentO.Univ 
ersityO.edu> . } 

Q6: SELECT ?X WHERE{ ?X rdf:type ub:Undergraduates 
tudent . } 

Q7: SELECT ?X, ?Y WHERE{ ?X rdf:type ub:Undergradu 
ateStudent . ?Y rdf:type ub:Course . ?X ub:takesCourse ?Y 
. <http://www.DepartmentO.UniversityO.edu/AssociateProf 
essorO> ub:teacherOf ?Y . } 

Q8: SELECT ?X, ?Y, ?Z WHERE{ ?X rdf:type ub:Undergra 
duateStudent . ?Y rdf:type ub:Department . ?X ub:memberOf 
?Y . ?Y ub:subOrganizationOf <http://www.UniversityO.ed 
u> . ?X ub:emailAddress ?Z . } 

Q9: SELECT ?X, ?Y, ?Z WHERE/ ?X rdf:type ub:Gradua 
teStudent . ?Y rdf:type ub:AssociateProfessor . ?Z rdf:type 
ub:GraduateCourse . ?X ub:advisor ?Y . ?Y ub:teacherOf ?Z 
. ?X ub:takesCourse ?Z . } 

Q10: SELECT ?X WHERE/ ?X rdf:type ub:TeachingAssist 
ant . ?X ub:takesCourse <http://www.DepartmentO.Univer 
sityO.edu/GraduateCourseO> . } 

Qll: SELECT ?X WHERE/ ?X rdf:type ub:ResearchGroup 
. ?X ub:subOrganizationOf ?Z . ?Z ub:subOrganizationOf <h 
ttp://www.University0.edu> . } 

Q12: SELECT ?X, ?Y WHERE/ ?Y rdf:type ub:Department 
. ?X ub:headOf ?Y. ?Y ub:subOrganizationOf <http://www 
.UniversityO.edu> . } 

Q13: SELECT ?X WHERE/ ?X rdf:type ub:GraduateStudent 
. ?X ub:undergraduateDegreeFrom <http://www.University 
0.edu> . } 

Q14: SELECT ?X WHERE/ ?X rdf:type ub:GraduateStudent 
■ } 


B LUBM Workload 

We generated a workload of 20,000 queries from LUBM bench¬ 
mark queries shown in E] For queries that do not have con¬ 
stants (Q2 and Q9), we generate different query patterns by 
removing some triples and mutating the node types. For ex¬ 
ample, in Q2, we generated 18 different patterns by alternat¬ 
ing student type between UndergraduateStudent and Grad- 
uateStudent (see Table [Toll. Similarly, other query patterns 
are generated by removing different combinations of the query 
triple patterns. We did not generate variations of Q6 and Q14 
as they have only one triple pattern ( rdf:type ) with a single 
constant. For the rest of the queries, we generated 1000 dif¬ 
ferent patterns from each query by varying the values of the 
query constants. For example, in Ql, we generate different 
query patterns by varying the values of both student type 
(UndergraduateStudent or GraduateStudent) and graduate 


Table 16 LUBM Workload 



Patterns 

Changes 

Qi 

1000 

Constants 

Q2 

18 

Structure/Constants 

Q3 

1000 

Constants 

Q4 

1000 

Constants 

Q5 

1000 

Constants 

Q6 

1 

No Changes 

Q7 

1000 

Constants 

Q8 

1000 

Constants 

Q9 

30 

Struct ure/Constants 

Q10 

1000 

Constants 

Qll 

1000 

Constants 

Q12 

1000 

Constants 

Q13 

1000 

Constants 

Q14 

1 

No Changes 


C YAG02 Queries 

Yl: SELECT ?GivenName ?FamilyName WHERE/ ?p y:h 
asGivenName ?GivenName . ?p y:hasFamilyName ?Family- 
Name . ?p y:wasBornIn ?city . ?p y:hasAcademicAdvisor ?a 
. ?a y:wasBornIn ?city . } 

Y2: SELECT ?GivenName ?FamilyName WHERE/ ?p y:h 
asGivenName ?GivenName . ?p y:hasFamilyName ?Family- 
Name . ?p y:wasBornIn ?city . ?p y: has Academic Ad visor ?a . 
?a y:wasBornIn ?city . ?p ydsMarriedTo ?p2 . ?p2 y:wasBornIn 
?city . } 

Y3: SELECT ?namel ?name2 WHERE/ ?al y:hasPrefer 
redName ?namel . ?a2 y:hasPreferredName ?name2 . ?al 
y:actedln ?movie . ?a2 y:actedln ?movie . } 

Y4: SELECT ?namel ?name2 WHERE/ ?pl y:hasPrefer 
redName ?namel . ?p2 y:hasPreferredName ?name2 . ?pl 
ydsMarriedTo ?p2 . ?pl y:wasBornIn ?city . ?p2 y:wasBornIn 
?city . } 

D Bio2RDF 

Bl: SELECT ?o WHERE/ <http://bio2rdf.org/pubmed.re 
source: 1374967_INVESTIGATOR_l> <http://bio2rdf.org/p 
ubmed_vocabulary:last_name> ?o . <http://bio2rdf.org/pub 
med_resource:1374967_AUTHOR_l> <http://bio2rdf.org/p 
ubmed_vocabulary:last_name> ?o . } 

B2: SELECT ?articleToMesh WHERE/ <http://bio2rdf.org 
/pubmed:126183> <http://bio2rdf.org/pubmed_vocabulary: 
mesh_heading> ?articleToMesh . ?articleToMesh <http://bio 
2rdf.org/pubmed.vocabulary:mesh_descriptor_name> ?mesh 
• } 

B3: SELECT ?phenotype WHERE/ ?phenotype rdf:type <ht 
tp://bio2rdf.org/omim_vocabulary:Phenotype> . ?phenotype 
rdfsdabel ?label . ?gene <http://bio2rdf.org/omim_vocabula 
ry:phenotype> ?phenotype . } 

B4: SELECT ?pharmgkbid WHERE/ ?pharmgkbid <http: 
//bio2rdf.org/pharmgkb.vocabulary:xref> <http://bio2rdf. 
org/drugbank:DB00126> . ?pharmgkbid <http://bio2rdf.or 
g/pharmgkb_vocabulary:xref> ?pccid . ?DDIassociation <h 
ttp://bio2rdf.org/pharmgkb_vocabulary:chemical> ?pccid . 
?DDIassociation <http://bio2rdf.org/pharmgkb_vocabulary: 
event> ?DDIevent . ?DDIassociation <http://bio2rdf.org/ph 
armgkb_vocabulary:chemical> ?drug2 . ?DDIassociation <ht 
tp://bio2rdf.org/pharmgkb_vocabulary:p-value> ?pvalue . } 
B5: SELECT ?interaction WHERE/ ?interaction <http://b 
io2rdf.org/irefindex_vocabulary:interactor_a> <http://bio2r 
df.org/uniprot:O17680> . } 


courses. 






